In a major push for “Sovereign AI,” Bengaluru-based startup Sarvam AI has officially launched Saaras V3, its latest generation of Automatic Speech Recognition (ASR). Released on February 11, 2026, Saaras V3 is specifically engineered to tackle the “code-mixing” (Hinglish, Tanglish, etc.) and noisy environments that typically cause global AI models to struggle.
The launch is part of Sarvam’s 14-day “launch blitz” leading up to the India-AI Impact Summit 2026.
Benchmarking Excellence: Beating the Giants
The most striking claim from the Saaras V3 release is its performance against global frontier models. According to Sarvam, Saaras V3 recorded a significantly lower Word Error Rate (WER) on the IndicVoices and Svarah datasets than Western competitors.
| Model | IndicVoices WER (Lower is Better) |
| Sarvam Saaras V3 | ~19.3% |
| OpenAI GPT-4o Transcribe | ~24.5% |
| Google Gemini 3 Pro | ~26.1% |
| Deepgram Nova-3 | ~23.8% |
Key Features of Saaras V3
Saaras V3 isn’t just a basic update; it introduces a new architecture trained on over one million hours of multilingual Indian audio.
- 22-Language Support: Natively understands all 22 scheduled Indian languages plus English.
- Real-Time Streaming: Unlike batch-only models, Saaras V3 can transcribe audio as it is being spoken with a “Time-to-First-Token” (TTFT) of under 150ms.
- Advanced Diarization: Automatically identifies and labels different speakers in a single recording, ideal for meeting transcripts and call center audits.
- Numeric Fidelity: High precision in capturing dates, currency, and phone numbers, even when spoken in mixed languages.
Pricing and Developer Access
Sarvam is positioning Saaras V3 as a cost-effective alternative for Indian enterprises, offering a “pay-per-use” model that is significantly cheaper than global APIs.
| Service Type | Price (INR) | Unit |
| Speech to Text | ₹30 | Per Hour |
| Speech to Text + Diarization | ₹45 | Per Hour |
| Speech to Text + Translation | ₹30 | Per Hour |
Beyond Speech: The Sarvam Ecosystem
Saaras V3 joins a growing suite of specialized models released by the startup in February 2026, including:
- Bulbul V3: A state-of-the-art Text-to-Speech (TTS) model with 35+ professional-quality Indian voices.
- Sarvam Vision: A 3-billion parameter model optimized for OCR and document intelligence in Indic scripts.
- Sarvam Dub: An AI-powered dubbing tool capable of zero-shot voice cloning.


