API
...
Creating Head Visuals
Expressive Digital Humans
17 min
feature overview the idle & active video segments feature lets you bring more life and realism to your digital human by using different visual states for moments of stillness versus moments of speech by providing distinct segments for “idle” and “speaking,” your digital human can display subtle, natural behavior when waiting, and more expressive, engaging behavior when responding why this matters using the same video segment for all states can look repetitive and reduce realism this feature allows smooth, natural idle behavior that feels alive, not frozen expressive facial and body movements during speech a more human like presence that keeps the viewer engaged how it works for you (configuration) upload a source video the video should include both a calm idle portion and a more active speaking portion make sure there is a clear moment where the idle segment ends and the active segment begins mark the transition point identify the exact timestamp (in seconds) where the change from idle to speaking occurs this ensures that the first and last frames of each segment match, giving a continuous, natural visual flow apply to your digital human once configured, the system will automatically use the idle segment when your digital human is waiting, and the active segment when it is speaking video guidelines start with minimal movement and a relaxed facial expression transition naturally into speaking with expressive gestures or facial changes keep the environment, lighting, and framing consistent for both segments key considerations timestamp accuracy the transition moment should be precise for seamless switching between states video quality high resolution, well lit footage creates more lifelike results preview test the setup to ensure smooth transitions and natural presentation video processing details idle to speaking state this section describes how the idle and speech video segments are generated from the uploaded video and the provided cut timestamp idle video generation once the cut timestamp (t) is set, the system takes the video from 0 to t this segment (0 to t) is then used to create an infinite loop by inverting the video speech video generation the speech video is generated based on the duration of the audio to be spoken if the audio duration is, for example, 5 seconds, the system takes a segment of the video from \<t to t + 2 5> seconds (half the audio duration) this segment (t to t + 2 5s) is then inverted and appended to itself to create a 5 second video that ends on the same frame as the cut timestamp (t) ensuring seamless transition the first and last frame of the idle video are the same as the first and last frame of the speech video this ensures that there are no visible skips or jumps between the idle and speech states, resulting in a smooth transition prerequisites before implementing this feature, ensure you have api access valid access to the unith api video recording a video recorded according to the guidelines in our " best practices for video recording " documentation video requirements continuous recording the video should be a continuous recording that includes both the inactive and active states idle state the video should begin with the actor in an idle state (e g , looking at the camera with minimal movement) in this case, the cut timestamp would be "3" active state the video should then transition to the active state, where the actor is speaking, raising their eyebrows, gesturing, or otherwise being expressive clear transition the transition between the idle and active states should be clear and well defined timestamp you must identify the precise timestamp (in seconds) where the transition from the idle state to the active state occurs this timestamp is crucial for configuring the feature example video scenario imagine a video where an actor starts by looking directly at the camera, calmly and still at the 3 second mark, the actor begins to speak and uses hand gestures in this case, the cut timestamp would be "3" implementation steps the following steps outline how to implement the two loops feature using the api 1\ upload video endpoint /video/upload method post description uploads the video containing both the idle and active states request headers accept application/json authorization bearer \<yourbearertoken> (replace \<yourbearertoken> with your actual bearer token) content type multipart/form data request body file the video file to upload (e g , my video mp4 ) curl example curl x 'post' \\ 'https //platform api unith ai/video/upload' \\ h 'accept application/json' \\ h 'authorization bearer yourbearertoken' \\ h 'content type multipart/form data' \\ f 'file=@/path/to/your/video mp4' replace /path/to/your/video mp4 with the actual path to your video file reponse status code 200 (ok) response body { "token" "temporary video token" } response parameters token (string) a temporary token representing the uploaded video this token is used in the next step 2\ create head visual endpoint /head visual/create method post description creates a new head visual resource, configuring it for the two loops feature request headers accept application/json x head video token id \<yourtemporaryvideotoken> (replace \<yourtemporaryvideotoken> with the token from the /video/upload response) authorization bearer \<yourbearertoken> (replace \<yourbearertoken> with your authorization token) content type application/json request body { "update" false, "detector version" "v2", "detector threshold" 0 2, "mode" "two loops", "cut timestamp" 3, // replace with the actual timestamp in seconds "debug" false } request parameters update (boolean) set to false for creating a new head visual detector version (string) use "v2" for optimal results detector threshold (number) the threshold for face detection mode (string) set to "two loops" to enable the two loops feature cut timestamp (number, required) the timestamp (in seconds) where the video transitions from the idle state to the active state crucially important parameter debug (boolean, optional) if set to true , the response will include a task id if video processing fails, a zip file containing frames and face detection results will be provided for debugging curl example curl x 'post' \\ 'https //platform api unith ai/head visual/create' \\ h 'accept application/json' \\ h 'x head video token id yourtemporaryvideotoken' \\ h 'authorization bearer yourbearertoken' \\ h 'content type application/json' \\ d '{ "update" false, "detector version" "v2", "detector threshold" 0 2, "mode" "two loops", "cut timestamp" 3, "debug" false }' replace yourtemporaryvideotoken with the token from the /video/upload response replace 3 with the actual timestamp reponse status code 200 (ok) response body { "data" { "id" "yournewheadvisualid", "task id" "yourtaskid" // only if debug is true } } response parameters id (string) the unique id of the new head visual this id is used in the next step task id (string, optional) the id of the video processing task (only included if debug is true ) 3\ save head visual endpoint /head visual/save method post description saves the new head visual resource request headers accept application/json authorization bearer \<yourauthbearertoken> (replace \<yourauthbearertoken> with your authorization token) content type application/json request body { "id" "yournewheadvisualid", // the head visual id from the /head visual/create response "name" "my new visual", // a unique name for this head visual "gender" "male", // the gender of the digital human "type" "talk" // the type of head visual } request parameters id (string, required) the id of the head visual to save (obtained from the / head visual/create response) name (string, required) a unique name for the head visual gender (string, required) the gender of the digital human (e g , "male", "female") type (string, required) the type of head visual (e g , "talk") curl example curl x 'post' \\ 'https //platform api unith ai/head visual/save' \\ h 'accept application/json' \\ h 'authorization bearer yourauthbearertoken' \\ h 'content type application/json' \\ d '{ "id" "yournewheadvisualid", "name" "my new visual", "gender" "male", "type" "talk" }' replace yournewheadvisualid and my new visual with the actual values response status code 200 (ok) 4\ use the new head visual to use your new head visual, simply select it when creating a new digital human important considerations video quality the quality of your source video is critical for achieving good results with the two loops feature refer to our video recording best practices for guidelines cut timestamp accuracy the cut timestamp parameter must be accurate an incorrect timestamp will result in a jarring or unnatural transition between the idle and active states testing thoroughly test your digital human with the two loops feature to ensure the transitions are smooth and the behavior is as expected