Tuesday, March 17, 2026

Trending

Related Posts

NVIDIA debut ‘Groq 3’ LPU chip

At the GTC 2026 keynote on March 16, 2026, NVIDIA CEO Jensen Huang officially introduced the NVIDIA Groq 3 LPU (Language Processing Unit). Marking the first hardware release since NVIDIA’s reported $20 billion acquisition of the startup Groq in late 2025, the chip is designed to tackle the “inference bottleneck” of trillion-parameter models and autonomous AI agents.

The “Inference Inflection” Chip

While NVIDIA’s GPUs (like the new Rubin) are the world’s workhorses for AI training, the Groq 3 LPU is a specialized “token factory” built solely for speed. Unlike GPUs that use high-capacity HBM memory, the LPU uses ultra-fast SRAM to deliver near-instantaneous responses.

  • Deterministic Architecture: The LPU functions as a software-defined assembly line, moving data directly between on-chip memory modules to eliminate the latency inherent in general-purpose GPU designs.
  • Extreme Bandwidth: Each Groq 3 LPU features 500 MB of on-chip SRAM delivering a staggering 150 TB/s of bandwidth—nearly 7x faster than the HBM4 found on the flagship Rubin GPU.
  • Agentic AI Focus: The chip is optimized for “test-time scaling,” allowing AI agents to “think” and reason through complex problems in real-time before providing a final answer.

The Groq 3 LPX Rack: Scale-Out Power

NVIDIA is not selling the LPU as a standalone component but as part of the Groq 3 LPX rack-scale system. This liquid-cooled infrastructure integrates seamlessly into NVIDIA’s “AI Factories.”

SpecificationNVIDIA Groq 3 LPX (Per Rack)
LPU Density256 Interconnected Groq 3 Chips
Total SRAM128 GB
SRAM Bandwidth40 PB/s (Petabytes per second)
Inference Compute315 PFLOPS (FP8)
Scale-Up Bandwidth640 TB/s

Co-Design: Rubin + Groq 3

Jensen Huang described the relationship between the Rubin GPU and the Groq 3 LPU as a “unified yet specialized” partnership. When deployed together in a Vera Rubin NVL72 system:

  1. The Rubin GPU handles the “prefill” (understanding the massive prompt and context).
  2. The Groq 3 LPU takes over the “decode” (generating the output tokens at ultra-high speed).

This combination delivers up to 35x higher inference throughput per megawatt for trillion-parameter models compared to previous architectures.

Manufacturing and Availability

Confirming a major strategic alliance, Huang announced that the Groq 3 LPU is being manufactured by Samsung Electronics Foundry using its advanced 4nm (SF4X) process.

  • Full Production: The chip is currently in production.
  • Shipping: Expected to reach global cloud service providers (including AWS, Azure, and Oracle) in the second half of 2026.
  • Successor: Huang briefly teased the “Feynman” platform (slated for 2027), which will incorporate an even more advanced LPU paired with 1.6nm GPUs.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles