Prime Intellect’s Prime-RL Aims to Train Trillion-Parameter AI Models
Prime Intellect has released a free, open tool called Prime-RL 0.6.0. It can train some of the biggest AI models in the world. The news came out on June 23, 2026. The tool lets teams train models with up to a trillion parameters. A parameter is one of the tiny values inside an AI model. The model learns these values from data. More useful parameters usually means the model can learn more. Prime-RL uses a teaching method called “reinforcement learning,” which we explain below. It works on very large “Mixture-of-Experts” models.
Why does this matter? Training a model this big used to be the secret of a few giant companies. Now Prime Intellect is giving the recipe away for free. This opens the door for small teams, startups, and researchers. They can now try work that once needed lots of money.
What is Prime-RL, in plain words?
Prime-RL is an open framework for “asynchronous reinforcement learning.” Let us break that down slowly.
Reinforcement learning (RL) is a way to teach an AI with rewards. The model tries a task. Then it gets a “good job” or “try again” signal. After many tries, it learns which choices work best. It is a bit like training a dog with treats, but for software.
Asynchronous means the different parts of the work do not wait in a strict line. While one part makes answers, another part learns from older answers. This keeps the costly computers busy instead of idle. That saves time and money.
A “framework” here is a ready-made set of tools and code. Teams use it so they do not have to build everything from scratch.
What does “Mixture-of-Experts” mean?
A Mixture-of-Experts (MoE) model is made of many smaller “expert” networks. Think of a hospital with many specialist doctors. When a patient walks in, you do not call every doctor. You call the one or two who fit the problem.
MoE models work the same way. They hold a huge number of parameters. But they only switch on the experts they need for each task. So the model can be very large yet still run at a fair cost. This is how it reaches a trillion parameters without the cost blowing up.
What can it do? The GLM-5 example
Prime Intellect showed Prime-RL training a model called GLM-5 on software engineering tasks. These are “agentic” tasks. That means the AI works in many steps and uses tools, like a worker fixing bugs in real code. Some tasks ran for 100 or more turns per try.
The company says the training handled a sequence length of 131,000 tokens. A token is a small chunk of text, often a word or part of a word. A long sequence length means the model can read a lot of code or text at once. The step time stayed under five minutes. It used a batch of 256 rollouts. It ran on 28 H200 nodes, which are powerful AI computers.
Key facts
| Item | Detail |
|---|---|
| Tool | Prime-RL 0.6.0 |
| Maker | Prime Intellect |
| Release date | June 23, 2026 |
| Goal | Train trillion-parameter MoE models with reinforcement learning |
| Example model trained | GLM-5 (software engineering tasks) |
| Sequence length | 131,000 tokens |
| Step time | Under 5 minutes |
| Batch size | 256 rollouts |
| Hardware used | 28 H200 nodes |
| License / access | Open-source; deployable via a single command on Slurm clusters |
Specs and technical features (as reported)
Here are the technical pieces Prime Intellect listed. We keep only the figures and features they reported. We also explain the hard words simply.
| Feature | What it is / what it does |
|---|---|
| Model type supported | Trillion-parameter Mixture-of-Experts (MoE) |
| Training method | Asynchronous reinforcement learning for agentic tasks |
| Training base | Built on torchtitan with 3-D parallelism: FSDP, Context Parallelism (CP), Expert Parallelism (EP) |
| Number format | Block-scaled FP8 quantization (a compact way to store numbers to save memory) |
| Inference speed-ups | FP8 inference with DeepEP and DeepGEMM kernels |
| Expert handling | Wide Expert Parallelism across 32 or more GPUs |
| Memory trick | Tiered KV cache offloading to CPU and disk |
| Stability trick | Router replay (R3) cuts KL mismatch by “roughly an order of magnitude” |
| Token handling | Prefill/Decode disaggregation for 4:1 token ratios |
| Sample models named | GLM-5.1, Kimi-K2.7-Code, NVIDIA-Nemotron-3-Ultra-550B |
What it means: Prime-RL spreads a giant model across many computers. It also shrinks the numbers to take less space. So a trillion-parameter model can be trained without breaking the budget.
A quick note on a few terms above. Inference is when a trained model actually answers a question, instead of learning. GPUs are special chips that do AI math very fast. FSDP (Fully Sharded Data Parallel) is a way to split a model into pieces across many chips. That way, no single chip has to hold the whole thing.
Why the speed tricks matter
Training huge models is slow and costly. So Prime Intellect added some tricks. One is “router replay.” They say it lowers a kind of training error by about ten times. Another is “tiered KV cache offloading.” It moves data to cheaper memory when fast memory runs low. These tricks keep the system fast and steady, even when one task runs for more than 100 steps.
FAQ
What is Prime-RL 0.6.0?
It is a free, open framework from Prime Intellect. It trains very large AI models using reinforcement learning. The 0.6.0 update lets it train trillion-parameter Mixture-of-Experts models.
What is a trillion-parameter model?
It is a model with about a trillion adjustable values. The model learns these values from data. More parameters can mean more skill. But they also cost more to train and run.
Do I need to pay for Prime-RL?
No. Prime Intellect says it is open-source. You can set it up with a single command on Slurm clusters. But you still need powerful hardware to run it at full scale.
What are “agentic” tasks?
These are jobs where the AI works in many steps and uses tools. One example is fixing code in a real software project. Some tasks ran 100 or more turns to finish.
Why it matters (especially for India / founders)
For Indian founders and students, open tools like this are a big deal. The hardest part of building advanced AI was never just ideas. It was the costly, secret know-how locked inside a few labs. When that recipe is free and open, more teams can join the race.
A startup in Bengaluru or a college lab in Pune can now study how trillion-parameter training works. They can train models on Indian languages or local business problems. They do not have to wait for a big foreign lab to do it. The hardware still costs money, but the playbook is now public.
For business owners, this points to a bigger trend. AI that can do long, multi-step tasks, like fixing software on its own, is getting cheaper to build. Over time, that could lower the cost of the AI tools you use to run your company.
The takeaway
Prime-RL 0.6.0 pushes open AI training to a new size. It shows that trillion-parameter, Mixture-of-Experts models can be trained with reinforcement learning on real, multi-step work. And it does this with a free framework. For anyone watching how fast open AI is moving, this is one more sign. The gap between giant labs and everyone else is getting smaller.
Source: MarkTechPost