API
How to guides
Creating Head Visuals
18 min
this document describes how premium users with api access can create custom head visuals this feature enables you to create personalized visual representations of digital humans a head visual is a fundamental requirement for successfully deploying a new digital human it comprises of a video that is preprocessed by the unith synthesis engine this video is then used as the basis for defining a head visual object that is effectively the face used for digital human note once a head visual is created, multiple digital humans can share a single head visual id important considerations content policy unith reserves the right to remove any head visual that is deemed offensive, harmful, or inappropriate video best practices before you begin, please refer to our separate documentation on the best practices for creating idle videos for digital humans this will ensure optimal results maximum video length the maximum supported video length for head visual creation is currently 20 seconds please refer to this file to follow the best practices when creating new video our synthesis pipeline works with 16 9 hd videos (1280x720p) in 25fps more detial on best practices of the videos can be found video guidelines for avatar creation docid\ zegzcyl7orc4evlokigu9 in our video guidelines process overview the process of creating a custom head visual involves the following steps uploading the source video creating the head visual resource saving the head visual assigning the head visual to your organization api endpoints 1\ upload video endpoint /video/upload method post description uploads the video source for the head visual request headers accept application/json authorization bearer \<yourauthbearertoken> (replace \<yourauthbearertoken> with your actual authorization token) content type multipart/form data request body file the video file to upload (e g , video mp4) the file parameter name is important curl x 'post' \\ 'https //platform api unith ai/video/upload' \\ h 'accept application/json' \\ h 'authorization bearer yourauthbearertoken' \\ h 'content type multipart/form data' \\ f 'file=@/path/to/your/video mp4' # replace /path/to/your/video m response status code 200 (ok) response body { "token" "temporary video token" // the temporary token for the uploaded video } response parameters token (string) a temporary, unique token representing the uploaded video this token is required for the next step error handling the endpoint will return standard http error codes for invalid requests, upload failures, or server errors ensure your request is correctly formatted and the video file is valid 2\ create head visual endpoint /head visual/create method post description creates a new head visual resource from the uploaded video request headers accept application/json x head video token id \<yourtemporaryvideotoken> (replace \<yourtemporaryvideotoken> with the token from the /video/upload response) authorization bearer \<yourauthbearertoken> (replace \<yourauthbearertoken> with your authorization token) content type application/json request body { "update" false, "detector version" "v2", "detector threshold" 0 2, "mode" "default", "cut timestamp" 0 1, "debug" false } request parameters update (boolean) indicates whether to update an existing head visual (set to false for new) don't change detector version (string) the version of the face detection algorithm to use use "v2" for best results detector threshold (number) the threshold for face detection don't change mode (string) the processing mode "default" is the standard mode don't change cut timestamp (number) the timestamp for cutting the video don't change debug (boolean, optional) if set to true, the response will include a task id if video processing fails, a zip file containing frames and face detection results will be provided for debugging curl example curl x 'post' \\ 'https //platform api unith ai/head visual/create' \\ h 'accept application/json' \\ h 'x head video token id yourtemporaryvideotoken' \\ h 'authorization bearer yourauthbearertoken' \\ h 'content type application/json' \\ d '{ "update" false, "detector version" "v2", "detector threshold" 0 2, "mode" "default", "cut timestamp" 0 1, "debug" false }' response status code 200 (ok) response body { "data" { "id" "yournewheadvisualid", // the unique id of the new head visual "task id" "yourtaskid" // the id of the processing task (only if debug=true) } } response parameters id (string) the unique identifier for the newly created head visual this id is used in subsequent steps task id (string, optional) the id of the video processing task this is only included if the debug parameter was set to true in the request error handling the endpoint will return standard http error codes for invalid requests, missing headers, or server errors 3\ save head visual endpoint /head visual/save method post description saves the head visual resource with the specified metadata request headers accept application/json authorization bearer \<yourauthbearertoken> (replace \<yourauthbearertoken> with your authorization token) content type application/json request body { "id" "yournewheadvisualid", // the head visual id from the /head visual/create response "name" "youruniqueheadvisualname", // a unique name for the head visual "gender" "male" or "female", // the gender of the digital human "type" "talk" // the type of head visual } request parameters id (string, required) the id of the head visual to save (obtained from the /head visual/create response) name (string, required) a unique name for the head visual this name must be unique within your organization gender (string, required) the gender of the digital human use either "male" or "female" type (string, required) the type of head visual typically, this is "talk" curl example curl x 'post' \\ 'https //platform api unith ai/head visual/save' \\ h 'accept application/json' \\ h 'authorization bearer yourauthbearertoken' \\ h 'content type application/json' \\ d '{ "id" "yournewheadvisualid", "name" "youruniqueheadvisualname", "gender" "female", "type" "talk" }' response status code 200 (ok) response body an empty string important notes the name parameter must be unique choose a descriptive and unique name for your head visual this endpoint may take some time to process , depending on the length of the uploaded video the head visual status will initially be "pending" until the video processing is complete you may need to check the status of the head visual separately if you need to confirm processing is done the url in the response body will be empty, unless debug was set to true, in which case the url of the debug zip file is returned error handling the endpoint will return standard http error codes for invalid requests, missing parameters, or if the head visual id is invalid it will also return an error if the chosen name is not unique simple head visual post processing guide this document outlines an optional, step by step procedure for post processing your source videos manually to achieve custom idle video this documentation assumes you have recorded your model according to the best practices described in the "creating head visuals" documentation and have a video with a green screen background general post processing procedure (default idle loop) this procedure focuses on creating a short, seamless idle loop for a default head visual 1\ creating the seamless idle loop the goal is to create a short, natural looking idle segment (under 5 seconds) that can be seamlessly looped the total length of your final video must be shorter than 10 seconds select software open your captured video in editing software (e g , davinci resolve, adobe premiere, etc ) identify loop points find a brief segment of the video (ideally less than 5 seconds) where the model's movement (e g , head movement, eye blinking) is natural and smooth avoid brisk or sudden movements , as these are highly visible when looping reverse and duplicate cut the selected segment, duplicate it, and reverse the speed of the duplicated clip by appending the reversed clip to the original, you create a perfect loop where the start and end frames match, ensuring a natural transition try to take one blink during the video, preferably around the middle of the recording — not at the beginning or end this helps create a more relaxed and natural appearance avoid a body or head movement in the first or last frame , we want a smooth movement and this will make the loop transition more noticeable 2\ keying and background removal key the green screen use keying tools (such as those available in after effects or davinci resolve) to accurately remove the green screen background from the subject 3\ adding a custom background select background add a custom background of your choice behind the keyed subject avoid distraction if you use a video background , ensure the movement or activity is minimal this video could also be turned into a loop to blend it better this prevents a noticeable change when transitioning from the static idle state to the active speaking state 4\ color correction and final adjustments color match perform color correction on the foreground (the model) to ensure the lighting and color tone seamlessly match the new background layer this can be done in any professional editing software 5\ exporting the final video the final video must adhere to the following specifications for processing by our synthesis pipeline as mentioned in the “head visual creation” documentation resolution 1280 x 720p (16 9 hd) frame rate 25 frames per second (25fps) format mp4 duration less than 10 seconds total size 3mb maximum key difference two loops video input when creating a video for the two loops (more expressive) head visual this format consists of a single video of up to 10 seconds the creation process is simpler single continuous video you do not need to manually cut and reverse the video (step 1 is skipped) video structure the input video is a single, continuous recording where the first half is the still idle state, ( which will be defined by the cut timestamp) and the second half is the expressive state (e g , subject moving hands, changing facial expression) system handles looping when creating the head visual in two loops mode, you specify the cut timestamp where the transition between the idle and speaking states occurs our system automatically handles the necessary looping and inversion for both the idle and expressive segments to ensure seamless, non jarring transitions this distinction is crucial for two loops , your editing work is focused purely on keying, background, and color correction, as the platform manages the looping mechanism in the first part of the video, follow the same recommendations as for the default idle loop format to ensure a seamless infinite loop include one blink in the first 4 seconds , and another between second 4 and the end this enhances realism and avoids a robotic look as before, avoid noticeable body or head movement in the first or last frame to generate a smooth loop using ai generation tools you have the freedom to use a variety of tools, including ai based solutions for video creation, source image generation, or face swapping however, please be aware training data our model was trained on real human video footage, and real video may deliver better performance than visibly ai generated content face detection our synthesis model relies on accurate face detection in every frame of the video ensure that any post processing or ai generation does not interfere with the clarity or consistency of the subject's face happy creating!