Cross-lingual Text-to-Speech with voice cloning capabilities. Clone any voice with just a few seconds of audio and generate speech in 17+ languages while preserving the original speaker's characteristics.
Clone any voice with just 6-10 seconds of reference audio. Preserve unique vocal characteristics, accent, and speaking style.
Generate speech in 17+ languages while maintaining the original speaker's voice characteristics across different languages.
Fast inference optimized for real-time applications with streaming support for interactive voice experiences.
Fine-grained control over emotional expression and speaking style while maintaining voice identity.
High-quality audio output with natural prosody and minimal artifacts for professional applications.
Support for multiple speakers in a single session with consistent voice quality and speaker identity.
curl -X POST https://api.clim.ai/v1/us/xtts \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "text=Hello, this is my cloned voice speaking in a new language!" \ -F "reference_audio=@speaker_sample.wav" \ -F "language=en" \ -F "output_format=wav" \ --output cloned_speech.wav
Discover how XTTS v2 can power your voice cloning and cross-lingual applications.