Robots can now learn from humans by watching 'how-to' videos
04-24-2025

Robots can now learn from humans by watching 'how-to' videos

Robots have long struggled with flexibility. Until now, even the most advanced robotic systems have required massive amounts of data and painstaking instruction to complete basic tasks.

If a robot dropped a tool or failed to follow a script precisely, it would typically shut down or fail completely. However, a new breakthrough from Cornell University might change that dynamic entirely.

A team of computer scientists has recently developed a new AI-powered framework called RHyME (Retrieval for Hybrid Imitation under Mismatched Execution).

Robots learning from humans

The new technology allows robots to learn complex, multi-step tasks by watching just a single human demonstration, even if the way humans perform a task differs significantly from how robots do.

“One of the annoying things about working with robots is collecting so much data on the robot doing different tasks. That’s not how humans do tasks. We look at other people as inspiration,” explained Kushal Kedia, a doctoral student in computer science at Cornell and the lead author of the study.

Kedia is set to present the team’s findings at the IEEE International Conference on Robotics and Automation in Atlanta, highlighting a system that could fundamentally alter how robots are trained.

From perfect scripts to practical adaptation

For decades, robotic learning has depended heavily on imitation. In a method known as “imitation learning,” robots watch human demonstrations to acquire new skills.

But this training has required extremely controlled demonstrations – human movements had to be smooth, precise, and consistent, or the robot wouldn’t be able to replicate the actions. Any deviation would result in failure.

“Our work is like translating French to English – we’re translating any given task from human to robot,” said senior author Sanjiban Choudhury, an assistant professor of computer science at Cornell’s Ann S. Bowers College of Computing and Information Science.

However, there’s a significant hurdle: humans are too fluid. Our movements are complex and unpredictable, often performed in ways that robots can’t easily mimic.

Training robots on video also traditionally required an enormous amount of footage to be even marginally successful. The mismatch between how humans behave and how robots are built to act has long been a stumbling block.

“If a human moves in a way that’s any different from how a robot moves, the method immediately falls apart,” Choudhury noted. “Our thinking was, ‘Can we find a principled way to deal with this mismatch between how humans and robots do tasks?’”

A smarter memory for smarter machines

RHyME is designed to overcome this long-standing challenge. Rather than trying to replicate human actions step by step, it equips robots with a sort of “common sense” memory system.

When a robot encounters a task it’s seen performed by a human – like putting a mug in the sink – it can recall and adapt related movements from its video archive, such as grasping a different object or performing a similar arm motion.

This process allows the robot to “connect the dots” even when human demonstrations don’t perfectly align with the robot’s own mechanics. RHyME essentially enables robots to synthesize new behaviors by creatively combining past examples.

In laboratory trials, robots trained with RHyME achieved over a 50% improvement in task success rates compared to traditional training techniques.

Even more impressive, RHyME accomplished this using just 30 minutes of robot-specific data, dramatically reducing the training overhead that has long been a barrier in robotics.

“This work is a departure from how robots are programmed today,” Choudhury said. “The status quo of programming robots is thousands of hours of tele-operation to teach the robot how to do tasks. That’s just impossible. With RHyME, we’re moving away from that and learning to train robots in a more scalable way.”

Robots: A step toward home helpers

Though consumer-grade home assistant robots are still a distant dream, RHyME represents a significant step in that direction.

The ability for robots to learn quickly and flexibly by observing people – even from a single video – could one day make robotic helpers more practical, affordable, and accessible.

As automation becomes increasingly integrated into everyday life, systems like RHyME are poised to streamline robotic training across industries. From elder care to warehouse logistics, these smarter robots could adapt to the dynamic nature of real-world environments with much less human oversight.

For now, the researchers are continuing to refine RHyME and explore how it might scale to even more complex tasks.

But the core insight remains: instead of forcing robots to copy us precisely, we can teach them to draw inspiration from us – just like we do from one another.

Read the paper, “One-Shot Imitation under Mismatched Execution.”

—–

Like what you read? Subscribe to our newsletter for engaging articles, exclusive content, and the latest updates.

Check us out on EarthSnap, a free app brought to you by Eric Ralls and Earth.com.

—–

News coming your way
The biggest news about our planet delivered to you each day
Subscribe