Researchers from Nvidia, Carnegie Mellon University, and the University of California, Berkeley have introduced ENPIRE — a framework that enables AI agents to enhance robot control policies on real hardware.
The system operates in a closed loop: a robot performs a task, the environment automatically evaluates the result and resets, while the AI agent analyzes errors, rewrites code, and initiates the next series of tests.
How ENPIRE Works
In robotics, training on real equipment remains an expensive and slow process. After a failed attempt, the scene must be reset, results checked, algorithms modified, and tests rerun, often requiring engineer involvement.
ENPIRE brings to the physical world an approach Nvidia calls AutoResearch: AI agents write code, test it, and improve it in subsequent iterations. However, unlike in a digital environment, each experiment involves real robots, cameras, objects, capture errors, friction, and other physical constraints.
The framework consists of four modules:
- Environment manages automatic scene resets, result verification, logging, and safety interfaces;
- Policy Improvement initiates policy enhancement;
- Rollout evaluates the policy on one or more physical robots;
- Evolution allows agents to analyze logs, seek ideas in literature, modify the training infrastructure, and correct code.
After the initial setup, the cycle can continue without constant human oversight. The agent gathers data from videos, trajectories, and reward functions, proposes new hypotheses, modifies code, tests results on the robot, and saves changes if they improve performance.
The Need for Automated Verification and Resetting
A key element of ENPIRE is the automation of two operations: result verification and scene resetting. The former allows the system to autonomously determine if a task has been completed. For instance, in a scenario involving a cable tie, the evaluation function combined a detector, segmentation model, and checks from two cameras, enabling the agent to receive success or failure signals without manual labeling of each run.
Automatic resetting allows for multiple attempts in succession. After a failed action, the robot must return the object or scene to a state suitable for the next experiment. Without this, training on real equipment quickly becomes dependent on constant human involvement.
As noted by Decrypt, initially, a human assists the agent in creating permanent tools — the reset procedure and reward function. After that, these tools are reused, and the agent takes over further policy improvements.
Demonstrations on Robots
In real experiments, the team tested ENPIRE on several manipulation tasks. Push-T checks if a robot can push a T-shaped object into a designated area. Pin Insertion requires inserting pins into 4 mm diameter holes. The setup of GPUs and operations with cable ties were also demonstrated.
Source: Nvidia.Nvidia's project page states that in real manipulation tasks, the system successfully completed the task 99% of the time when the agent was given up to eight attempts considering previous errors. This metric reflects the system's ability to recover from failures and repeat actions with context, rather than the accuracy of a single isolated attempt.
For programming agents, the team compared Codex on GPT-5.5, Claude Code on Opus 4.7, and Kimi Code on Kimi K2.6. The evaluation was conducted in the AutoEnvBench benchmark on Push-T and Pin Insertion tasks.
Source: Nvidia.The researchers also tested ENPIRE in RoboCasa — a simulator for household tasks such as opening cabinets, drawers, and turning objects on or off in the kitchen. In these scenarios, ENPIRE outperformed Nvidia's GR00T and CaP-X — an agent system that uses tools but does not run a complete cycle of automated research.
Eight Robots Accelerated Learning
A separate section of the work focuses on scaling across a fleet of robots. Nvidia conducted an experiment on eight robotic stations, each equipped with its own hardware, computer, and AI programming agent.
The stations shared results via Git: a successful idea or code change could quickly spread among agents. This approach significantly reduced training time. According to Decrypt, transitioning from one robot to eight cut the time to master Push-T from about five hours to two. For Pin Insertion, the time decreased from over 90 minutes to around 40 minutes.
Limitations
The authors emphasized that scaling does not solve all problems. When agents read logs, write code, debug it, or wait for responses from the underlying language model, robots and computational resources are not fully utilized. As the number of robots increases, GPU activity rises, but the average load on the robots themselves decreases. Agent teams spend more time generalizing results from other branches and coordinating rather than just conducting physical runs.
Another limitation is the increased token consumption. A larger fleet of robots accelerates policy development but requires more tokens due to log reading, idea sharing, and coordination among agents.
Additionally, ENPIRE has only been demonstrated on a limited set of manipulation tasks. Its results do not imply that robots can autonomously master arbitrary physical skills in an open environment without engineering preparation.
In June, Nvidia introduced the Isaac GR00T Reference Humanoid Robot — a research reference design for developing and testing humanoid robot skills. The configuration includes a Unitree H2 Plus body and tactile five-fingered hands from Sharpa Wave.
Previously, Unitree unveiled the "world's first ready-for-mass-production" piloted robot, capable of moving on two and four limbs.
