NVIDIA debut 'Groq 3' LPU chip

At the GTC 2026 keynote on March 16, 2026, NVIDIA CEO Jensen Huang officially introduced the NVIDIA Groq 3 LPU (Language Processing Unit). Marking the first hardware release since NVIDIA’s reported $20 billion acquisition of the startup Groq in late 2025, the chip is designed to tackle the “inference bottleneck” of trillion-parameter models and autonomous AI agents.

The “Inference Inflection” Chip

While NVIDIA’s GPUs (like the new Rubin) are the world’s workhorses for AI training, the Groq 3 LPU is a specialized “token factory” built solely for speed. Unlike GPUs that use high-capacity HBM memory, the LPU uses ultra-fast SRAM to deliver near-instantaneous responses.

Deterministic Architecture: The LPU functions as a software-defined assembly line, moving data directly between on-chip memory modules to eliminate the latency inherent in general-purpose GPU designs.
Extreme Bandwidth: Each Groq 3 LPU features 500 MB of on-chip SRAM delivering a staggering 150 TB/s of bandwidth—nearly 7x faster than the HBM4 found on the flagship Rubin GPU.
Agentic AI Focus: The chip is optimized for “test-time scaling,” allowing AI agents to “think” and reason through complex problems in real-time before providing a final answer.

The Groq 3 LPX Rack: Scale-Out Power

NVIDIA is not selling the LPU as a standalone component but as part of the Groq 3 LPX rack-scale system. This liquid-cooled infrastructure integrates seamlessly into NVIDIA’s “AI Factories.”

Specification	NVIDIA Groq 3 LPX (Per Rack)
LPU Density	256 Interconnected Groq 3 Chips
Total SRAM	128 GB
SRAM Bandwidth	40 PB/s (Petabytes per second)
Inference Compute	315 PFLOPS (FP8)
Scale-Up Bandwidth	640 TB/s

Co-Design: Rubin + Groq 3

Jensen Huang described the relationship between the Rubin GPU and the Groq 3 LPU as a “unified yet specialized” partnership. When deployed together in a Vera Rubin NVL72 system:

The Rubin GPU handles the “prefill” (understanding the massive prompt and context).
The Groq 3 LPU takes over the “decode” (generating the output tokens at ultra-high speed).

This combination delivers up to 35x higher inference throughput per megawatt for trillion-parameter models compared to previous architectures.

Manufacturing and Availability

Confirming a major strategic alliance, Huang announced that the Groq 3 LPU is being manufactured by Samsung Electronics Foundry using its advanced 4nm (SF4X) process.

Full Production: The chip is currently in production.
Shipping: Expected to reach global cloud service providers (including AWS, Azure, and Oracle) in the second half of 2026.
Successor: Huang briefly teased the “Feynman” platform (slated for 2027), which will incorporate an even more advanced LPU paired with 1.6nm GPUs.

Lapaas Voice

Subscribe to newsletter

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Startup

Artificial Intelligence

Funding

Case Studies

Lapaas Voice

Trending

Related Posts

NVIDIA debut ‘Groq 3’ LPU chip

The “Inference Inflection” Chip

The Groq 3 LPX Rack: Scale-Out Power

Co-Design: Rubin + Groq 3

Manufacturing and Availability

LEAVE A REPLY Cancel reply

Popular Articles

Lapaas Voice

About us

Latest Articles

Most Popular

Subscribe