XTTS v2

Cross-lingual Text-to-Speech with voice cloning capabilities. Clone any voice with just a few seconds of audio and generate speech in 17+ languages while preserving the original speaker's characteristics.

Try XTTS v2 View Documentation

📌 What XTTS v2 Is Good At

Voice Cloning

Clone any voice with just 6-10 seconds of reference audio. Preserve unique vocal characteristics, accent, and speaking style.

• Few-shot voice cloning
• Preserves vocal timbre
• Maintains speaking style

Cross-Lingual Synthesis

Generate speech in 17+ languages while maintaining the original speaker's voice characteristics across different languages.

• 17+ language support
• Cross-lingual voice transfer
• Accent preservation

Real-Time Generation

Fast inference optimized for real-time applications with streaming support for interactive voice experiences.

• Real-time synthesis
• Streaming audio output
• Low-latency inference

Emotional Control

Fine-grained control over emotional expression and speaking style while maintaining voice identity.

• Emotion conditioning
• Style transfer
• Prosody control

High Fidelity

High-quality audio output with natural prosody and minimal artifacts for professional applications.

• 24kHz sample rate
• Natural prosody
• Minimal artifacts

Multi-Speaker Support

Support for multiple speakers in a single session with consistent voice quality and speaker identity.

• Multiple voice profiles
• Speaker consistency
• Voice library management

🧱 Infrastructure Compatibility

GPU Requirements

NVIDIA RTX 4070

Minimum

NVIDIA RTX 4080

Recommended

NVIDIA L4

Optimal

NVIDIA A100

Enterprise

Performance Metrics

Latency (RTX 4070)~800ms

Latency (L4)~400ms

ConcurrencyUp to 4 streams

Memory Usage6-8GB VRAM

📍 Region Compatibility

🇺🇸

USA

Available

Voice cloning available
All languages supported
Real-time synthesis

🇬🇧

UK (London)

Available

GDPR-compliant voice cloning
European accent support
Cross-lingual synthesis

🇪🇺

EU (Netherlands)

Available

Data residency guaranteed
Multi-language cloning
Privacy-first processing

🇦🇪

UAE (Dubai)

Coming Soon

Arabic voice cloning
Regional compliance
Middle East deployment

🇸🇬

Singapore

Available

Asian language cloning
APAC optimization
Low latency for Asia

💻 Code Examples

Voice Cloning with XTTS v2

curl -X POST https://api.clim.ai/v1/us/xtts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "text=Hello, this is my cloned voice speaking in a new language!" \
  -F "reference_audio=@speaker_sample.wav" \
  -F "language=en" \
  -F "output_format=wav" \
  --output cloned_speech.wav

🔗 Explore Use Cases

Discover how XTTS v2 can power your voice cloning and cross-lingual applications.

Text-to-Speech Use Cases Voice AI Applications Voice Cloning Guide