Training robots faster with new training method

Photo: Pixabay / Michal Jarmoluk
Wouter Hoefnagel
Wouter Hoefnagel
21 August 2023
4 min

A new training method speeds up and simplifies training robots. The method provides insight into how a robot can successfully complete a task it could not successfully perform in the future. This makes it possible to fine-tune the robot's operation so that it can perform the task successfully in the future.

The method was developed by Andi Peng, an electrical engineering and computer science student at Massachusetts Institute of Technology (MIT). Together with researchers at MIT, New York University and University of California at Berkely, she created a framework that allows humans to teach a robot a task with minimal effort. A feature of this is the use of an algorithm to understand, if a task fails, what changes are needed for the robot to perform a task successfully.

As a concrete example, MIT cites a cup that a robot should pick up but fails to do so. In this case, for example, the algorithm can understand that the task might have succeeded if the cup was a different colour. It then requests feedback from its human user on why it was unable to perform a task. The system uses this feedback to create new data.

Fine-tuning operation

The system uses this data to fine-tune the robot's operation. In practice, this fine-tuning consists of further refining the operation of a machine learning model previously trained to perform a specific task. Based on its findings and user feedback, the system can teach such a machine learning model to also perform a second task.

The method has been field-tested by the researchers. This shows that their system can teach robots tasks more efficiently than other available methods. It also shows that robots trained using the framework produce better results in practice. At the same time, the training process takes less time from humans.

No technical knowledge required

The researchers also point out another important advantage: no specific knowledge is required to use the system. The system therefore also allows users without technical knowledge to teach robots new tasks. They can also more easily teach robots to operate in new and unfamiliar spaces.

The latter is of great importance. For example, in a practical environment, robots may encounter objects and spaces that they did not encounter during their training. This can lead to robots in practice not knowing how to act in a new environment.

Imitation learning

A well-known method for training robots is so-called 'imitation learning'. In practice, this means that a human user recites an action, and the robot copies this action. However, this training method can also teach unintended things to a robot. As an example, Peng cites a training session where a robot is trained by a human with a white head. It may thereby learn that all cups are white, and in practice have trouble picking up red or blue cups, for example.

"I don't want to perform demonstrations with 30,000 cups; I want to perform a demonstration with just one cup. However, then I need to train the robot so that it recognises that it can pick up a cup of any colour," Peng explains.

Data Augmentation

The researchers' system therefore teaches a robot what object the user wants to bring to the robot's attention, which in this case is a cup. The system also identifies which elements are not important for this purpose, such as the colour of the cup. It then uses this information to generate new data by modifying these less important visual features. This process is also called data augmentation.

The framework consists of three steps. The first step consists of identifying the task that the robot cannot perform successfully. Step two consists of users demonstrating the desired actions. In step three, the system generates so-called counterfactuals, which it uses to identify what needs to change for the robot to succeed in its task.

Generating additional demonstrations

The system then presents these counterfactuals to a human user, who provides feedback on them. In this way, the system determines which visual concepts do not impact the desired action. Based on this human feedback, the system then generates new demonstrations for training the robot. In practice, this means, for example, that a user picks up a cup once, after which the system generates additional demonstrations with thousands of cups to further train the robot.

The system is currently only tested on a simulated robot. The researchers want to test their system on real robots in the near future. They also want to reduce the time the system takes to generate new data by using new generative machine learning models.

The researchers are presenting their research results at the International Conference on Machine Learning. In addition, more information is available in the paper published by the researchers.

Author: Wouter Hoeffnagel
Photo: Pixabay / Michal Jarmoluk

Wouter Hoefnagel

Wouter Hoeffnagel is a freelance journalist and copywriter, with interests in both manufacturing industry, IT and the intersection between these topics. He writes a wide range of texts on these topics, ranging from background articles, interviews and news items to blog posts, white papers, case studies and website texts.