xAI officially released Grok Voice Think Fast 1.0 on Thursday, April 23, 2026. This new flagship voice model is designed to solve the “messiness” of real-world verbal communication, combining low-latency response times with deep, background reasoning capabilities.
Unlike previous iterations that focused on simple speech-to-text, Think Fast 1.0 is an agentic model built for complex, multi-step workflows like customer sales, technical support, and appointment booking.
1. Key Features: Why “Think Fast”?
The “Think Fast” moniker refers to the model’s ability to perform background reasoning without adding to the conversational latency. It effectively “thinks” while it is listening or preparing to speak.
- Zero Added Latency: xAI claims the model can reason through edge cases and catch its own mistakes in real-time, maintaining a snappy, human-like cadence.
- Precise Data Handling: The model excels at capturing structured data—such as email addresses, account numbers, and complex physical addresses—even when spoken with thick accents or mid-sentence corrections (disfluencies).
- Harder to Fool: xAI has specifically trained the model to avoid the “hallucination trap” common in voice AI, where models give confident but incorrect answers to ambiguous questions.
- Full-Duplex Interruption: It supports natural turn-taking and can be interrupted mid-sentence without losing the context of the previous request.
2. Benchmark Performance (τ-voice)
Grok Voice Think Fast 1.0 has taken the top spot on the τ-voice (Tau-Voice) Benchmark, which evaluates voice agents in noisy, realistic environments.
| Industry Scenario | Grok Voice Think Fast 1.0 | Gemini 3.1 Flash Live | GPT Realtime 1.5 |
| Retail (Orders/Returns) | 62.3% | 45.6% | 38.6% |
| Airlines (Complex Rebooking) | 66.0% | 40.0% | 36.0% |
| Telecom (Billing Disputes) | 73.7% | 21.9% | 21.1% |
3. Real-World Battle Testing
Before its public API release, the model was “battle-tested” within Elon Musk’s larger ecosystem to handle high-stakes customer interactions.
- Starlink Support: Currently powers the primary voice support and sales lines for Starlink global customers.
- Tesla Integration: Integrated into the latest Tesla software update (2026.12) to handle hands-free vehicle diagnostics and service scheduling.
- Multilingual Reach: Natively supports over 25 languages, allowing for global enterprise deployments with localized accents and slang.
4. Developer API and Pricing
As of April 23, the model is available via the xAI Console for developers building interactive voice applications.
- Standalone Speech APIs: This follows the April 17 launch of standalone Speech-to-Text (STT) and Text-to-Speech (TTS) APIs.
- Pricing (TTS): Approximately $4.20 per million characters for high-fidelity, expressive voices that include tags for laughs, whispers, and emphasis.
- Pricing (STT): $0.10 per hour for batch processing and $0.20 per hour for real-time streaming.
