Home Technology Artificial Intelligence Sarvam AI release ‘Saaras V3’ model

Sarvam AI release ‘Saaras V3’ model

0

In a major push for “Sovereign AI,” Bengaluru-based startup Sarvam AI has officially launched Saaras V3, its latest generation of Automatic Speech Recognition (ASR). Released on February 11, 2026, Saaras V3 is specifically engineered to tackle the “code-mixing” (Hinglish, Tanglish, etc.) and noisy environments that typically cause global AI models to struggle.

The launch is part of Sarvam’s 14-day “launch blitz” leading up to the India-AI Impact Summit 2026.

Benchmarking Excellence: Beating the Giants

The most striking claim from the Saaras V3 release is its performance against global frontier models. According to Sarvam, Saaras V3 recorded a significantly lower Word Error Rate (WER) on the IndicVoices and Svarah datasets than Western competitors.

ModelIndicVoices WER (Lower is Better)
Sarvam Saaras V3~19.3%
OpenAI GPT-4o Transcribe~24.5%
Google Gemini 3 Pro~26.1%
Deepgram Nova-3~23.8%

Key Features of Saaras V3

Saaras V3 isn’t just a basic update; it introduces a new architecture trained on over one million hours of multilingual Indian audio.

  • 22-Language Support: Natively understands all 22 scheduled Indian languages plus English.
  • Real-Time Streaming: Unlike batch-only models, Saaras V3 can transcribe audio as it is being spoken with a “Time-to-First-Token” (TTFT) of under 150ms.
  • Advanced Diarization: Automatically identifies and labels different speakers in a single recording, ideal for meeting transcripts and call center audits.
  • Numeric Fidelity: High precision in capturing dates, currency, and phone numbers, even when spoken in mixed languages.

Pricing and Developer Access

Sarvam is positioning Saaras V3 as a cost-effective alternative for Indian enterprises, offering a “pay-per-use” model that is significantly cheaper than global APIs.

Service TypePrice (INR)Unit
Speech to Text₹30Per Hour
Speech to Text + Diarization₹45Per Hour
Speech to Text + Translation₹30Per Hour

Beyond Speech: The Sarvam Ecosystem

Saaras V3 joins a growing suite of specialized models released by the startup in February 2026, including:

  • Bulbul V3: A state-of-the-art Text-to-Speech (TTS) model with 35+ professional-quality Indian voices.
  • Sarvam Vision: A 3-billion parameter model optimized for OCR and document intelligence in Indic scripts.
  • Sarvam Dub: An AI-powered dubbing tool capable of zero-shot voice cloning.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version