Huawei has begun shipping its CloudMatrix 384 cluster using 384 Ascend 910C processors, delivering approximately 67% more compute power and 3× memory compared to Nvidia’s NVL72 (72 GB200 chips) aiquantumintelligence
🌟 6 Key Insights
- Cluster Powerhouse
While individual Ascend 910C chips aren’t as strong as Nvidia’s GB200 or H100, networking them via optical supernodes achieves immense scale—300 PFLOPs vs Nvidia’s 180 PFLOPs cluster - Memory Advantage
CloudMatrix offers over three times the memory capacity, enabling larger model training and improved parallelism - Efficiency Trade-offs
The cluster consumes ~559 kW—2.3–4× more power than Nvidia’s setup. Chinese data centers offset this with cheaper energy and staff - Domestic AI Sovereignty
Developed in response to U.S. export restrictions, CloudMatrix strengthens China’s self-sufficiency in AI infrastructure - Software & Operational Complexity
Rivaling Nvidia’s mature CUDA stack, Huawei uses the nascent CANN/MindSpore ecosystem, which experts warn demands 3–5× more manpower due to its infancy - Strategic R&D Push
Backed by Huawei’s record annual R&D spending (~$25 billion), the project shows strong theoretical and engineering resolve—even with chips one generation behind US competitors
🤔 Why It Matters
- AI Scale at National Level: Demonstrates China’s capabilities to build super-sized AI clusters independently of Western GPUs.
- Reframing Efficiency Calculus: Shows trade-offs between raw performance and operational costs in diverse economic contexts.
- Global Competitive Pressure: Intensifies the race between chip ecosystems, pressuring Nvidia to innovate further.
🔭 What Comes Next?
- Industry Uptake: Over 10 units already deployed to Chinese data centers; expect broader rollout among Alibaba, ByteDance
- Advanced Chips on Horizon: Huawei is prepping newer Ascend 910D/920 chips (6–7 nm) aiming closer to Nvidia’s H20 class.
- Ecosystem Evolution: Continuous improvements to MindSpore/CANN will be essential for broader adoption.
✅ Final Takeaway
Huawei’s CloudMatrix 384 cluster marks a landmark in high-scale AI performance, outpacing Nvidia’s GB200-based cluster in compute and memory. But it comes at the cost of energy efficiency and software reliability—a powerful mix of ambition and constraint in China’s AI sovereignty drive.


