Nvidia Research: Robots That Train Themselves Using AI Coding Agents

What if a robot could teach itself a new skill, with almost no help from people? That is the big idea behind a new research project called ENPIRE. It uses AI coding agents to help robots learn to grab and move things. An AI coding agent is an AI program that can read, write, and fix computer code on its own. The work comes from Nvidia, Carnegie Mellon University, and UC Berkeley. It was shared on June 17, 2026.

In simple words, the AI agent acts like a smart trainer. It watches the robot. It writes the training code. It tests what works. Then it keeps making things better. People set up the safety rules at the start. After that, the group of robots mostly trains itself.

What is ENPIRE?

ENPIRE is a system that lets robots learn “dexterous” tasks. Dexterous just means skillful hand movements. For example, picking up a small pin, or pushing an object into the right spot. These tasks are hard for robots. The real world is messy and full of tiny details.

Normally, a human expert writes the training code. That same person also watches over each robot. This takes a lot of time and money. ENPIRE tries to remove most of that human work. It hands the job to an AI coding agent instead.

How does it work?

The researchers split the work into two parts. In the first part, a human helps the AI agent set up the workspace. This includes safety limits. It also includes a way to reset the table by itself. And it includes a way to check if the robot won or lost.

The agent then writes its own “reward function.” A reward function is the rule that tells the robot when it did a good job. To build this rule, the agent watches short videos. It sees the robot winning and the robot losing. It only needs a few minutes of video.

In the second part, the agent works on its own. It reads research papers. It comes up with its own ideas. Then it changes the training code. It can pick the best way to learn for that moment. For example, it may use behavior cloning, which means copying example moves. Or it may use reinforcement learning, which means learning by trial and error with rewards. It picks based on how the real robot is doing.

The results so far

The team tested ENPIRE on a fleet of eight robot stations. A fleet just means a group of robots working together. These robots are called YAM robots, and each one has two arms. The results were strong on some tasks. On certain hard jobs, the robots got it right up to 99% of the time.

Speed also got better when more robots worked together. On a task called Push-T, the team grew the group from 1 robot to 8 robots. This cut the total time from about 5 hours down to 2 hours. On a pin-insertion task, training time dropped from over 90 minutes to about 40 minutes. The system also reached its goal 100% faster than older methods. Those older methods needed a human helping the whole time.

Which AI agents were tested?

The team did not use just one AI agent. They compared three popular coding agents. They wanted to see which one trained robots best. The best one was Codex running on GPT-5.5. The other two were Claude Code on Opus 4.7 and Kimi Code on Kimi K2.6.

Key facts

Detail	Reported figure
Project name	ENPIRE
Built by	Nvidia, Carnegie Mellon University, UC Berkeley
Publication date	June 17, 2026
Robot fleet	8 dual-arm YAM robot stations
Top success rate	Up to 99% on demanding tasks
Push-T time (1 to 8 robots)	~5 hours down to 2 hours
Pin-insertion time	90+ minutes down to ~40 minutes
Convergence speed	100% faster than human-in-the-loop
Best AI agent	Codex with GPT-5.5

What about the limits?

The research is honest about its weak spots. The robots did much better in simulation than in the real world. Simulation means a pretend version of the world made on a computer. In fact, two of the three agents failed the real-world Push-T task. They passed the same task in simulation, but not in real life.

There were other trade-offs too. With bigger groups, each robot was used a little less. This is because the robots needed more time to work together. Also, the token costs grew faster than the gains. Token costs are the price of running the AI agent. In short, more AI work did not always add enough extra value.

FAQ

Do robots really train without humans?

Not fully. Humans still set up the safety rules and the workspace at the start. After that, the AI coding agent does most of the training work on its own.

What is an AI coding agent?

It is an AI program that can read, write, test, and fix computer code by itself. Here, it writes the code that teaches the robots new skills.

Is ENPIRE ready for factories?

Not yet. It is a research project. The robots do worse in real life than in simulation. So there is still work to do before wide use.

Which AI agent worked best?

Codex with GPT-5.5 was the top performer in the tests. It did better than Claude Code and Kimi Code.

Why it matters (especially for India / founders)

Training robots is slow and costly today. It often needs expert engineers for every machine. But if AI coding agents can take over much of that work, training could get cheaper and faster. That is a big deal for any founder building hardware or automation. A founder is a person who starts a company.

For Indian startups, this points to a hopeful future. Smaller teams could train a group of robots without a huge staff. India is getting stronger in software and AI talent. A system that turns robot training into a coding job fits that strength well. The warning about token costs is also useful. It reminds founders to watch the bill, because AI agents can get pricey fast.

The takeaway

ENPIRE shows a clear path. Let AI agents do the hard work of robot training. The early numbers look good, like 99% success on some tasks and faster training with bigger groups. But the real-world failures and rising costs show this is still an early step. If these gaps close, self-training robots could change how factories and labs work.

Source: The Decoder

Related coverage

Venture Capital Pulse: Seedcamp Closes $320M as Robotics Startup Funding Hits 2026 Record