OpenAI officially launched gpt-realtime-1.5 into its Realtime API. This update represents a significant leap forward for production-grade voice agents, moving beyond the beta phase into a more robust, low-latency framework designed for high-stakes enterprise applications.
The model is a multimodal “speech-to-speech” engine that handles audio input and output natively, eliminating the need for separate transcription (ASR) and text-to-speech (TTS) steps.
Key Performance Upgrades
OpenAI has highlighted several “under-the-hood” improvements that address the primary pain points for developers building voice-first applications:
- Multilingual Precision: Significant improvements in language switching and accent recognition, particularly for non-English speakers.
- Instruction Following: A 7% increase in the model’s ability to adhere to complex behavioral prompts during live conversations.
- Alphanumeric Accuracy: A 10.23% boost in the accuracy of transcribing and speaking numbers, dates, and codes—critical for financial and booking services.
- Reasoning: A 5% gain on Big Bench Audio reasoning benchmarks, making it more capable of solving logic puzzles or complex queries via voice.
Technical Features & Pricing
The 1.5 version remains highly efficient while maintaining a competitive pricing structure consistent with the original realtime release.
| Feature | gpt-realtime-1.5 Specification |
| Context Window | 32,000 Tokens |
| Max Output | 4,096 Tokens |
| Connection Type | WebRTC (Client-side) or WebSockets (Server-side) |
| New Voices | Includes “Marin” and “Cedar” for improved naturalness. |
| Pricing (Text) | $4 / 1M Input |
| Pricing (Audio) | $32 / 1M Input |
Early Adopter Reports
Several tech partners have already reported performance gains from the new architecture:
- Genspark: Reported that connection rates nearly doubled (reaching 66%) and phone call errors were cut in half.
- Sendbird: Highlighted that the model is significantly better at handling interruptions, allowing for more natural, “human-like” turn-taking without the AI becoming confused.
The “GPT-5” Connection
The launch of gpt-realtime-1.5 is part of a broader rollout of the GPT-5 family. While gpt-realtime-1.5 focuses on low-latency voice, it utilizes the same foundational reasoning architecture found in the GPT-5.2 flagship model released earlier this month, ensuring that voice agents are as intelligent as their text counterparts.


