OpenAI boosts ‘GPT-5.3-Codex-Spark’ speed 30% to 1,200+ Tokens per second

February 21, 2026

OpenAI launched GPT-5.3-Codex-Spark, a specialized “research preview” model that achieved a breakthrough in inference speed by hitting 1,200+ tokens per second. This makes it approximately 15x faster than the standard GPT-5.3 Codex, effectively enabling “real-time” conversational coding.

The speed boost is the result of a first-of-its-kind hardware partnership with Cerebras Systems, utilizing their Wafer-Scale Engine 3 (WSE-3).

The Speed vs. Intelligence Trade-off

To achieve these record-breaking speeds, OpenAI utilized a “distilled” and “pruned” version of the GPT-5.3 architecture. This allows the entire model to fit within the ultra-fast SRAM of a single Cerebras wafer, eliminating the latency of moving data between traditional GPUs.

Feature	GPT-5.3-Codex (Standard)	GPT-5.3-Codex-Spark
Generation Speed	~70–80 Tokens/sec	1,200+ Tokens/sec
Reasoning Score	~77.3% (Terminal-Bench 2.0)	~58.4% (Terminal-Bench 2.0)
Context Window	400k+ Tokens	128k Tokens (Text-only)
Hardware	NVIDIA H100 / B200 Clusters	Cerebras WSE-3

Key Features for Developers

OpenAI is positioning “Spark” not as a replacement for high-reasoning models, but as a “Super-Autocomplete” engine.

“Conversational” Programming: The low latency (50% reduction in time-to-first-token) allows developers to interrupt, redirect, and iterate on code mid-generation without breaking their “flow state.”
Targeted Edits: By default, Spark is tuned to make minimal, surgical changes to code rather than rewriting large blocks, saving both tokens and review time.
Interactive Code Blocks: Integrated with the new Interactive Code Blocks in ChatGPT (launched Feb 19), Spark can render diagrams and mini-apps almost instantly as you describe them.
Zero Quota Impact: Usage of Codex-Spark currently has its own “rate limit bucket” for ChatGPT Pro users, meaning it doesn’t count against your standard GPT-5.3 or “Thinking” limits.

Best Use Cases (The “Drafter-Reviewer” Pattern)

Early developer feedback suggests a two-step workflow is the most efficient way to use the new speed:

Draft with Spark: Use it for “grunt work”—generating unit tests, boilerplate, regex, or simple UI components.
Review with Codex 5.3: If a task requires complex multi-file logic or security-critical code, use the standard model to verify the Spark-generated output.

How to Access

Web/App: Select the “Spark” toggle in the model selector (available for ChatGPT Pro/Enterprise).
CLI: Use the flag -m gpt-5.3-codex-spark after updating to the latest @openai/codex 0.104.0 version.
IDE: Available as a research preview model in the Codex VS Code extension.

{{post_title}}

OpenAI boosts ‘GPT-5.3-Codex-Spark’ speed 30% to 1,200+ Tokens per second

The Speed vs. Intelligence Trade-off

Key Features for Developers

Best Use Cases (The “Drafter-Reviewer” Pattern)

How to Access

NO COMMENTS

LEAVE A REPLY

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

The Speed vs. Intelligence Trade-off

Key Features for Developers

Best Use Cases (The “Drafter-Reviewer” Pattern)

How to Access

RELATED ARTICLES

18-24 age accounts for 50% of ChatGPT users in India, says...

OpenAI has 200+ team working on “AI devices”

Anthropic launch Claude code Security Tool

NO COMMENTS

LEAVE A REPLY Cancel reply

LEAVE A REPLY