OpenAI launched GPT-5.3-Codex-Spark, a specialized “research preview” model that achieved a breakthrough in inference speed by hitting 1,200+ tokens per second. This makes it approximately 15x faster than the standard GPT-5.3 Codex, effectively enabling “real-time” conversational coding.
The speed boost is the result of a first-of-its-kind hardware partnership with Cerebras Systems, utilizing their Wafer-Scale Engine 3 (WSE-3).
The Speed vs. Intelligence Trade-off
To achieve these record-breaking speeds, OpenAI utilized a “distilled” and “pruned” version of the GPT-5.3 architecture. This allows the entire model to fit within the ultra-fast SRAM of a single Cerebras wafer, eliminating the latency of moving data between traditional GPUs.
| Feature | GPT-5.3-Codex (Standard) | GPT-5.3-Codex-Spark |
| Generation Speed | ~70–80 Tokens/sec | 1,200+ Tokens/sec |
| Reasoning Score | ~77.3% (Terminal-Bench 2.0) | ~58.4% (Terminal-Bench 2.0) |
| Context Window | 400k+ Tokens | 128k Tokens (Text-only) |
| Hardware | NVIDIA H100 / B200 Clusters | Cerebras WSE-3 |
Key Features for Developers
OpenAI is positioning “Spark” not as a replacement for high-reasoning models, but as a “Super-Autocomplete” engine.
- “Conversational” Programming: The low latency (50% reduction in time-to-first-token) allows developers to interrupt, redirect, and iterate on code mid-generation without breaking their “flow state.”
- Targeted Edits: By default, Spark is tuned to make minimal, surgical changes to code rather than rewriting large blocks, saving both tokens and review time.
- Interactive Code Blocks: Integrated with the new Interactive Code Blocks in ChatGPT (launched Feb 19), Spark can render diagrams and mini-apps almost instantly as you describe them.
- Zero Quota Impact: Usage of Codex-Spark currently has its own “rate limit bucket” for ChatGPT Pro users, meaning it doesn’t count against your standard GPT-5.3 or “Thinking” limits.
Best Use Cases (The “Drafter-Reviewer” Pattern)
Early developer feedback suggests a two-step workflow is the most efficient way to use the new speed:
- Draft with Spark: Use it for “grunt work”—generating unit tests, boilerplate, regex, or simple UI components.
- Review with Codex 5.3: If a task requires complex multi-file logic or security-critical code, use the standard model to verify the Spark-generated output.
How to Access
- Web/App: Select the “Spark” toggle in the model selector (available for ChatGPT Pro/Enterprise).
- CLI: Use the flag
-m gpt-5.3-codex-sparkafter updating to the latest@openai/codex0.104.0 version. - IDE: Available as a research preview model in the Codex VS Code extension.
