At the GTC 2026 keynote on March 16, 2026, NVIDIA CEO Jensen Huang officially introduced the NVIDIA Groq 3 LPU (Language Processing Unit). Marking the first hardware release since NVIDIA’s reported $20 billion acquisition of the startup Groq in late 2025, the chip is designed to tackle the “inference bottleneck” of trillion-parameter models and autonomous AI agents.
The “Inference Inflection” Chip
While NVIDIA’s GPUs (like the new Rubin) are the world’s workhorses for AI training, the Groq 3 LPU is a specialized “token factory” built solely for speed. Unlike GPUs that use high-capacity HBM memory, the LPU uses ultra-fast SRAM to deliver near-instantaneous responses.
- Deterministic Architecture: The LPU functions as a software-defined assembly line, moving data directly between on-chip memory modules to eliminate the latency inherent in general-purpose GPU designs.
- Extreme Bandwidth: Each Groq 3 LPU features 500 MB of on-chip SRAM delivering a staggering 150 TB/s of bandwidth—nearly 7x faster than the HBM4 found on the flagship Rubin GPU.
- Agentic AI Focus: The chip is optimized for “test-time scaling,” allowing AI agents to “think” and reason through complex problems in real-time before providing a final answer.
The Groq 3 LPX Rack: Scale-Out Power
NVIDIA is not selling the LPU as a standalone component but as part of the Groq 3 LPX rack-scale system. This liquid-cooled infrastructure integrates seamlessly into NVIDIA’s “AI Factories.”
| Specification | NVIDIA Groq 3 LPX (Per Rack) |
| LPU Density | 256 Interconnected Groq 3 Chips |
| Total SRAM | 128 GB |
| SRAM Bandwidth | 40 PB/s (Petabytes per second) |
| Inference Compute | 315 PFLOPS (FP8) |
| Scale-Up Bandwidth | 640 TB/s |
Co-Design: Rubin + Groq 3
Jensen Huang described the relationship between the Rubin GPU and the Groq 3 LPU as a “unified yet specialized” partnership. When deployed together in a Vera Rubin NVL72 system:
- The Rubin GPU handles the “prefill” (understanding the massive prompt and context).
- The Groq 3 LPU takes over the “decode” (generating the output tokens at ultra-high speed).
This combination delivers up to 35x higher inference throughput per megawatt for trillion-parameter models compared to previous architectures.
Manufacturing and Availability
Confirming a major strategic alliance, Huang announced that the Groq 3 LPU is being manufactured by Samsung Electronics Foundry using its advanced 4nm (SF4X) process.
- Full Production: The chip is currently in production.
- Shipping: Expected to reach global cloud service providers (including AWS, Azure, and Oracle) in the second half of 2026.
- Successor: Huang briefly teased the “Feynman” platform (slated for 2027), which will incorporate an even more advanced LPU paired with 1.6nm GPUs.


