Create a Digital Human

8 min

to create a digital human you will need a user and organisation depending on your organization's type and privileges, you will have access to various head visuals see user docid\ a6pxi9wvcn3uxnytwj7ja for more information components all digital humans are made up of the following required components name alias face voice operating mode prior to starting the creation process, it is important to consider what operating mode your digital human should use this is directly related to the intended use case operating modes digital humans can operate in 5 distinct modes text to video / video specify text for a given digital human, and a video will be generated with the digital human speaking the text the output is an mp4 file open dialogue configure a prompt for a given digital human, and the digital human will be conversational the output is a hosted, conversational digital human document / knowledge base provide content and configure a prompt, and the digital human will be conversational about the content provided the output is a hosted, conversational digital human voiceflow build a workflow using voiceflow, and the digital human will be able to guide you through the conversational workflow the output is a hosted, conversational digital human following voiceflow conversation plugin mode leverage a webhook to connect any custom conversational engine or llm to power the conversation of the digital human the output is a hosted, conversational digital human digital human creation process all digital humans need an existing head visual and voice to see your available head visuals, see list faces docid\ vpzbfsclk26xbcillnvdg and list voices docid\ ujeqesnyxa j9j2 z6jtl documentation names and aliases are free text fields which are used for personalization there are no restictions apart from being required fields use the post /head/create to create digital humans each operation mode requires a different request body (see configuration parameters docid\ qi6kwaiqafhoaqaan5nkg ) doc qa { "headvisualid" "\<stringidretrievablefromlistfaces>", "name" "test qa", "alias" "ai responder", "languagespeechrecognition" "en us", "language" "en us", "operationmode" "doc qa", "promptconfig" { "system prompt" "string" }, "ttsprovider" "elevenlabs", "ocprovider" "playground", "ttsvoice" "jessica eleven turbo v2 5", "greetings" "hi there!", "phrases" \["unith","barcelona"], "customwords" {"barcelona" "barthelona","ai" "a eye"} } ttt { "headvisualid" "\<stringidretrievablefromlistfaces>", "name" "ttt", "alias" "repeater", "languagespeechrecognition" "en us", "language" "en us", "ttsprovider" "audiostack", "operationmode" "ttt", "ocprovider" "playground", "ttsvoice" "coco", "greetings" "i repeate what you say!" } oc { "headvisualid" "\<stringidretrievablefromlistfaces>", "name" "testoc", "alias" "open ai salesperson", "languagespeechrecognition" "en us", "language" "en us", "operationmode" "oc", "promptconfig" { "system prompt" "string" }, "ttsprovider" "elevenlabs", "ocprovider" "playground", "ttsvoice" "jessica eleven turbo v2 5", "greetings" "hi there!", "phrases" \["unith","barcelona"], "customwords" {"barcelona" "barthelona","ai" "a eye"} } voiceflow { "headvisualid" "\<stringidretrievablefromlistfaces>", "name" "guide", "alias" "ai guider", "languagespeechrecognition" "en us", "language" "en us", "ttsprovider" "audiostack", "operationmode" "voiceflow", "voiceflowapikey" "string", "ocprovider" "playground", "ttsvoice" "coco", "greetings" "hi there!" } plugin { "headvisualid" "\<stringidretrievablefromlistfaces>", "name" "guide", "alias" "ai guider", "languagespeechrecognition" "en us", "language" "en us", "ttsprovider" "elevenlabs", "ocprovider" "playground", "ttsvoice" "jessica eleven turbo v2 5", "operationmode" "plugin", "pluginoperationalmodeconfig" { "name" "\<plugin name>", "url" "\<plugin url>", "options" { "\<optional param>" "option value" } }, "greetings" "hi there!" } the payload can contain many additional properties of the digital human as described in the configuration parameters docid\ qi6kwaiqafhoaqaan5nkg page if you have access to multiple organsations, you will need to add "orgid" "\<orgid>" to the payload request body parameters parameter data type description headvisualid string head visual id the unique identifier of the visual representation (the "look") chosen for the digital human, selected from a list of available head visuals orgid string organization id the unique identifier of your organization this parameter is used to associate the digital human with your organization account suggestions string used in "document based question answering" and open dialogue;( operationmode=doc qa , operationmode=oc allows manual overriding or adding to automatically extracted suggestions from uploaded documents format as a json array string, e g , \["example1", "example2"] name string the system generated name for the digital human, derived from the alias it's generally recommended to leave this parameter untouched alias string the user defined, short, and unique name for your digital human this alias will be used to construct the public name of the digital human languagespeechrecognition string sets the language for speech recognition, using codes supported by microsoft azure speech services refer to \[ microsoft azure documentation ]for a full list of supported languages language string frontend language sets the language for the user interface and frontend elements of the digital human to get the list of currently supported languages please use the endpoint get/languages/all ttsprovider string text to speech provider specifies the service used for text to speech generation out of the box support for "elevenlabs" "azure" "audiostack" operationmode string operation mode defines the operational behavior of the digital human possible values are "oc" (open conversation), "doc qa" (document based question answering), "ttt" (text to talk video generation from text input) and "plugin" for custom plugin implementation promptconfig string used to customize the behavior of the digital human for "doc qa" and "oc" operation modes see nested parameters below for details system prompt (nested promptconfig) array sets the overall system prompt to guide the behavior and personality of the digital human in open conversation mode operational mode "doc qa" includes function calls for accurate retrieval ttsvoice string text to speech voice selects a specific voice for text to speech output choose from a wide range of voices across different providers see \[ voice list documentation ]for available voices greetings string sets the initial greeting message that the digital human will use to start a conversation voiceflowapikey string required only when operationmode is set to "voiceflow" paste your voiceflow conversation api key here to connect the digital human to your voiceflow conversation flow israndomsuggestions boolean random suggestions determines whether suggestions are displayed in a fixed order ( false ) or in a randomized order ( true ) defaults to true if not specified pluginoperationalmodeconfig object plugin configuration used when operationmode is set to "plugin" allows specifying plugin specific configurations refer to the plugin documentation for detailed payload structure customwords object custom words allows defining custom pronunciations for specific words provided as a key value object (dictionary) where keys are the words and values are their custom pronunciations (e g , {"unith" "iunit"} ) case sensitive example requests curl x 'post' \\ 'https //platform api unith ai/head/create' \\ h 'accept application/json' \\ h 'authorization bearer \<yourbearer>' \\ h 'content type application/json' \\ d '{ "headvisualid" "abc123abc123", "name" "ttt", "alias" "repeater", "languagespeechrecognition" "en us", "language" "en us", "ttsprovider" "audiostack", "operationmode" "ttt", "ocprovider" "playground", "ttsvoice" "coco" }' doc qa digital humans need an additional step to be functional this is described in create a doc qa digital human docid 0d qdu8wz5ugua deosea and related to uploading a knowledge document interact with a digital human upon creating a digital human with a single call to the head/create endpoint, your digital human is now hosted by unith at chat unith ai/\<generatedpath> this path can be found in the post head/create response, or in the get head/{id} response public url { "publicid" "headid", "publicurl" "https //chat unith ai/\[org id]/\[head id]?api key=\[org api key]", } embed a digital human in your application see embed digital humans in your application docid\ tdwdj0bo8et9f0k9o5kvb