Home Technology Artificial Intelligence OpenAI boosts ‘GPT-5.3-Codex-Spark’ speed 30% to 1,200+ Tokens per second

OpenAI boosts ‘GPT-5.3-Codex-Spark’ speed 30% to 1,200+ Tokens per second

0

OpenAI launched GPT-5.3-Codex-Spark, a specialized “research preview” model that achieved a breakthrough in inference speed by hitting 1,200+ tokens per second. This makes it approximately 15x faster than the standard GPT-5.3 Codex, effectively enabling “real-time” conversational coding.

The speed boost is the result of a first-of-its-kind hardware partnership with Cerebras Systems, utilizing their Wafer-Scale Engine 3 (WSE-3).


The Speed vs. Intelligence Trade-off

To achieve these record-breaking speeds, OpenAI utilized a “distilled” and “pruned” version of the GPT-5.3 architecture. This allows the entire model to fit within the ultra-fast SRAM of a single Cerebras wafer, eliminating the latency of moving data between traditional GPUs.

FeatureGPT-5.3-Codex (Standard)GPT-5.3-Codex-Spark
Generation Speed~70–80 Tokens/sec1,200+ Tokens/sec
Reasoning Score~77.3% (Terminal-Bench 2.0)~58.4% (Terminal-Bench 2.0)
Context Window400k+ Tokens128k Tokens (Text-only)
HardwareNVIDIA H100 / B200 ClustersCerebras WSE-3

Key Features for Developers

OpenAI is positioning “Spark” not as a replacement for high-reasoning models, but as a “Super-Autocomplete” engine.

  • “Conversational” Programming: The low latency (50% reduction in time-to-first-token) allows developers to interrupt, redirect, and iterate on code mid-generation without breaking their “flow state.”
  • Targeted Edits: By default, Spark is tuned to make minimal, surgical changes to code rather than rewriting large blocks, saving both tokens and review time.
  • Interactive Code Blocks: Integrated with the new Interactive Code Blocks in ChatGPT (launched Feb 19), Spark can render diagrams and mini-apps almost instantly as you describe them.
  • Zero Quota Impact: Usage of Codex-Spark currently has its own “rate limit bucket” for ChatGPT Pro users, meaning it doesn’t count against your standard GPT-5.3 or “Thinking” limits.

Best Use Cases (The “Drafter-Reviewer” Pattern)

Early developer feedback suggests a two-step workflow is the most efficient way to use the new speed:

  1. Draft with Spark: Use it for “grunt work”—generating unit tests, boilerplate, regex, or simple UI components.
  2. Review with Codex 5.3: If a task requires complex multi-file logic or security-critical code, use the standard model to verify the Spark-generated output.

How to Access

  • Web/App: Select the “Spark” toggle in the model selector (available for ChatGPT Pro/Enterprise).
  • CLI: Use the flag -m gpt-5.3-codex-spark after updating to the latest @openai/codex 0.104.0 version.
  • IDE: Available as a research preview model in the Codex VS Code extension.

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version