API
...
Configuration Parameters
Voice Selection Guide [TTS]
10 min
overview the unith interface supports multiple text to speech (tts) providers to give your digital human a natural, engaging voice you can select voices from elevenlabs or microsoft azure directly through the interface, or integrate custom voice providers using our connector framework please check our https //docs unith ai/voice connectors on voice connectors that we support need a different voice provider? you have full flexibility to create custom voice connectors please check out the following https //github com/unith ai/voice connector template how voice selection impacts performance digital human responses require audio generation before video synthesis can begin the audio generation speed directly affects the overall response time of your digital human response pipeline user query processed audio generated ← voice model speed matters here video synthesized from audio complete response delivered faster audio generation means quicker responses and a more natural, engaging user experience recommended voices by provider elevenlabs elevenlabs offers a wide variety of voices powered by different models, each optimized for specific use cases for digital human applications, we recommend using voices powered by their speed optimized models recommended models true 220,220,221left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type best practice select elevenlabs voices that use flash v2, flash v2 5, turbo v2, or turbo v2 5 models for the fastest digital human response times important notes all elevenlabs models will function correctly with digital humans non optimized models may result in longer response delays speed optimized models are specifically designed for real time conversational applications for a complete list of available voices and their associated models, https //elevenlabs io/docs/overview/capabilities/voices microsoft azure microsoft azure offers an extensive voice catalog across multiple performance tiers for optimal digital human performance, we recommend selecting voices from their speed optimized tiers recommended voice types select voices that include one of these identifiers in their name true 165,165,165,166left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type voices to avoid avoid voices containing hdneural in their name, as these prioritize audio quality over generation speed and will result in longer response times azure voice performance tiers the table below provides an overview of microsoft azure's voice catalog organized by performance characteristics true 165,165,165,166left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type left unhandled content type for multilingual digital humans, prioritize voices with turbomultilingual in their name to maintain fast response times across all supported languages for the complete azure voice catalog and detailed specifications, visit the https //learn microsoft com/en us/azure/ai services/speech service/index text to speech voice selection best practices prioritize speed optimized models choose voices specifically designed for low latency applications test before deploying always test selected voices with your digital human to ensure they meet your quality and performance requirements consider your audience balance response speed with voice quality based on your use case language requirements if you need multilingual support, select voices that cover all required languages while maintaining performance