GSI Technology has announced that its new “Associative Processing Unit” (APU), named Gemini‑I APU, delivers GPU-class throughput comparable to NVIDIA’s A6000 GPU for certain AI workloads—while consuming up to 98% less energy.
The result comes from an independent study led by researchers at Cornell University and presented at the Micro ’25 conference.
What exactly was measured
- The Cornell study used realistic workloads, notably retrieval-augmented generation (RAG) tasks over datasets ranging from 10 GB to 200 GB. Investing.com
- On those RAG tasks the Gemini-I APU matched the throughput of an NVIDIA A6000 GPU while consuming ~98% fewer watts (or using only ~2% of the energy).
- It also reportedly reduced retrieval latency by up to 80% compared to conventional CPUs.
- The architecture: “compute-in-memory” (CIM) / “compute-in-SRAM” design that integrates processing logic directly into memory arrays, thereby cutting the data-movement bottleneck.
Why this is a big deal
- Energy efficiency jump: With AI workloads consuming ever more power (data centres, inference farms), cutting energy by ~98% while maintaining performance is a massive leap.
- Performance parity: Matching a high-end GPU like the NVIDIA A6000 (a powerful card used for AI, visualization, etc.) suggests the architecture isn’t just low-power for toy tasks but viable for real workloads.
- New architectural paradigm: The compute-in-memory / compute-in-SRAM concept challenges the traditional separation of memory and compute in modern processors. If this works at scale, it could shift how AI hardware gets designed.
- Edge & data-centre implications: Lower power usage means less cooling, smaller infrastructure, potential for use in constrained environments (edge devices, embedded systems) as well as large data centres.
- Competitive pressure: This puts pressure on established GPU companies (e.g., NVIDIA, AMD) and might accelerate innovation in energy-efficient AI hardware.
Key limitations & what to watch
- Workload scope is limited: The study focused on RAG workloads and datasets of 10–200 GB. It is not yet clear how the Gemini-I APU performs on all types of AI workloads (training large language models, huge GPUs, etc.).
- Ecosystem & software support: Even if the hardware is superior on a certain benchmark, success will depend on tooling, software stack, compatibility, and wide adoption. GPUs benefit from mature ecosystems; CIM is newer.
- Scalability and commercialisation: The hardware must scale (fabrication, yield, reliability) and show consistent performance in real-world systems (not just lab benchmarks).
- Latency, memory size, model compatibility: Some tasks may require large memory capacities, multiple GPUs/accelerators, or very high precision which may not yet be fully tested.
- Energy claims need real-world validation: While benchmarking in controlled conditions shows ~98% lower energy, real-world system-level efficiency (including power supply, cooling, etc.) may vary.
Implications for India & the global AI hardware market
- For Indian AI companies / startups: This development means that alternatives to high-power GPUs may soon exist or become more accessible—potentially lowering barriers for AI infrastructure domestically.
- Data-centre operators in India: Energy costs (electricity, cooling) are major concerns. Hardware that delivers similar compute at far lower power could reduce operational expenses significantly.
- Edge computing in India (IoT, remote, low-power scenarios): Smaller, more efficient hardware means smarter devices in power-constrained settings—a boon for remote or rural deployments.
- Skill and supply-chain implications: If compute-in-memory hardware gains momentum, engineers and developers will need to upskill for new architectures; also supply-chain (semiconductor manufacturing, memory tech) may shift.
- Competitive dynamics: Indian hardware ecosystem and chip design may benefit from this trend by focusing on low-power, high-efficiency compute rather than direct brute-force high-power design.
What’s next
- Watch for Gemini-II APU from GSI Technology which the company says will deliver ~10× faster throughput and even lower latency. Investing.com
- Real-world deployment and partner announcements: who uses this chip, in what systems, and whether independent benchmarks replicate the results.
- Further research publications beyond RAG workloads—how does compute-in-memory perform on training, large-model inference, different precision types, etc.
- Hardware integration: form-factors, memory/chip packaging, system-level power/thermal profiles, compatibility with existing AI frameworks/infrastructure.
- Competitive response: how GPU makers respond with more efficient architectures, and how cloud providers/inference service providers adapt.
Final Thought
GSI Technology’s announcement of the Gemini-I APU matching NVIDIA A6000 GPU performance with ~98% less energy could mark a turning point in AI hardware. It shows that the future of AI compute could be not just faster, but far more efficient, reshaping how we think about power, performance and hardware architecture. That said, the real test will be scaling this from benchmark to widespread use.


