ElevenLabs has officially introduced Eleven v3 (alpha)—a groundbreaking text-to-speech model capable of whispering, laughing, sighing, and even singing, delivering lifelike vocal expressiveness for the first time in TTS. This alpha release marks a leap in realism and emotional nuance for AI-generated speech.
What Makes Eleven v3 Special?
- Expressive Audio Tags: Users embed cues like
[whispers]
,[laughs]
,[sings]
, or[excited]
directly into prompts to shape tone, emotion, pacing, and accents - Multi-Speaker Dialogue Support: Create realistic back-and-forth speech scenes with overlapping, natural interruptions .
- Massive Language Coverage: Supports over 70 languages, expanded from 29 in previous versions
- Advanced Text Understanding: Enhanced ability to detect stress patterns, cadence, and subtleties from text alone .
Why Eleven v3 Is a Game-Changer
- Ultra-Realistic Output: Voices no longer sound robotic—they breathe, laugh, and emote.
- Creative Control: Inline audio tags give creators fine-grained command over performance.
- Enhanced Storytelling: Ideal for audiobooks, games, dialogue scenes, and films.
- Global Reach: Supports a broad user base across languages and cultures.
- Future-Ready API: While this alpha focuses on pre-recorded output, a real-time API is in development
How It Works
Simply insert audio tags into your script:
textCopyEdit[whispers] Something’s coming... [laughs] You won't believe it!
Or use the Text-to-Dialogue endpoint:
jsonCopyEdit[
{ "speaker_id": "Alice", "text": "(excited) We did it!" },
{ "speaker_id": "Bob", "text": "(laughs) That was amazing!" }
]
Eleven v3 then generates a seamless, emotionally rich audio clip.
Availability & Pricing
- Alpha release now live via ElevenLabs UI.
- 80% off access until the end of June. yourstory.com
- API access and real-time mode launching soon.
Suggested Use Cases
- Audiobook creators aiming for dramatic, immersive narration
- Content makers adding layered voiceovers with character interplay
- Game developers producing dynamic dialogue in multiple languages
- Accessibility tools requiring true emotional tone and clarity
Summary
Eleven v3 redefines what AI speech can do—whispering, singing, and laughing in a natural human cadence, across 70+ languages. With expressive inline tagging and dialogue support, it offers creators new tools to build deeply engaging audio experiences. And with public alpha available now, the future of voice AI just got a whole lot more human.