Human-in-the-loop transfer learning in collision avoidance of autonomous robots

0:00 0:00

Download the Audio (Right-click, Save-As)

You might call it common sense. Or gut-instinct. Or a hunch. But what is it? What is intuition? And what would it mean for AI to possess it?

In today’s paper, researchers explore that idea, with a new technique based on RLHF: Reinforcement Learning with Human Feedback. They argue that they’ve developed a better way to transfer knowledge to a model during feedback rounds, creating something akin to intuition. We’ll start by exploring how neural networks are used in autonomous driving, how they help with collision detection, and how human feedback is used to reduce the need for trial-and error. Then we’ll walk through these authors’ contribution: a new framework that can harness human expertise and transfer it to a model more successfully. Before we go any further, let’s break down neural networks and how they’re trained.

Neural networks are essentially pattern recognition systems that learn from data. Training them involves feeding them information, allowing them to make predictions, and then comparing those predictions to the correct answer. The network then adjusts its internal settings known as weights, using a method called backpropagation. This helps AI models improve over time by reducing errors.

To put this into perspective, imagine a small bumper car equipped with sensors that detect collisions. It can move forwards, backwards, left, and right. To build a picture of its surroundings, it starts moving randomly, bumping into the edges of the track. The more it moves, the more it learns the boundaries, and the smarter it becomes. Through trial and error, it gathers enough data to navigate the track while avoiding collisions.

Now, a closed-off track is perfect for this type of training. With predictable barriers and clear pathways, the neural network can quickly improve. But if you take that same bumper car and place it in a busy city center, things get much more complicated. Suddenly, it has to deal with pedestrians, other vehicles, changing light conditions, environmental shifts, and inconsistent barriers. Now, the neural network has to work even harder to make real-time decisions.

In these complex, real-world environments, gathering enough data is extremely time-consuming, and training on the fly presents safety risks. Imagine training a self-driving car in a busy city for the first time, one collision is already one too many.

Researchers have been working on this problem for years. How can we speed up training models without relying solely on slow, trial-and-error learning?

For a long time, pure reinforcement training has been the go-to method. It works by rewarding correct actions and penalizing incorrect ones, encouraging the model to maximize positive outcomes over time.

More recently, Reinforcement Learning from Human Feedback has helped improve training time and alignment with human preferences, predictions, and judgment. Chatbots like ChatGPT, Claude, and Gemini have used this method to refine their conversational abilities over time. Early chatbot responses were factual but often rude or confusing. In the world of autonomous vehicles, self-driving cars also require a level of knowledge that allows them to overcome small unpredictable obstacles that were never incorporated into the training dataset. By incorporating human feedback, AI models could become more natural and intuitive.

In most RLHF systems, you’d train your model first and then provide feedback to it after it’s already up and running. The authors did something different: they built a system that could provide a form of “pre-feedback” to the model well before training had taken place. So rather than starting it’s training process from a blank slate, they were seeding the model with what they called “intuition” instead. This might be a bit counterintuitive, so let’s walk through their process in detail:

Setup: They created a little robot that could initially only move around randomly, and collect sensor data while it did so.
Pre-Training: They’d let the robot move for a moment, then stop it, and provide feedback on the actions it should have taken during its run. This is called Human-in-the-Loop (HITL). For example, if it had encountered an obstacle that was too close to the front sensor, the robot would be instructed to move backwards the next time that happens.
Training: Now equipped with basic human knowledge, the robot learns on its own, refining its behavior through trial and error. You might be used to thinking of “training” as something that uses a dataset, but trial-and-error is an equally valid form of training. We call this “reinforcement learning”. In this case, the learning process is supercharged because the model already has a starting strategy. It has “intuition”, so it learns much faster and avoids unnecessary mistakes.

By injecting intuition early on, the HITL model makes robots more efficient, adaptable, and better equipped to handle dynamic environments, unexpected obstacles, and even hardware malfunctions.

For their experiment, researchers used a small Arduino-based robot called Zumo. These robots are designed for movement tracking and can be customized for different tasks. In this case, the Arduino board was equipped with: Four ultrasonic sensors (front, back, left, and right) to measure distances between 2 cm and 400 cm, and bluetooth connectivity, allowing remote monitoring from a PC. The motors’ speed was controlled using Pulse Width Modulation (PWM) with each side's motor being controlled independently. The setup allowed the robot to maneuver easily within the test environment.

The team then conducted six experiments inside a 150 cm x 150 cm enclosure, each designed to test different aspects of training.

Experiment A. Pre-Training Only: For the first experiment, researchers tested how the robot would perform with human-guided pre-training but no reinforcement. The robot performed basic collision avoidance but after 50 actions, it stalled in the center of the enclosure, unable to refine its movements further.
Experiment B. Reinforcement Learning Only: Next, the researchers tested what would happen if the robot only received reinforcement training, without any human-guided pre-training. The robot struggled in the beginning, immediately colliding with the wall in front. Its early movements showed a high level of indecisiveness, with many patterns repeated before ultimately colliding into the wall again.
Experiment C. Combined Pre-Training and Reinforcement Learning: The researchers combined the two methods by pre-training the robot using information from experiment A, with reinforcement learning to fine-tune movements. This test performed far better than the first two. The robot was able to move to the center of the enclosure, no matter where it began, and once in the center, it was able to move freely, avoiding any collision.
Experiment D. Dynamic/Moving Obstacles: For this experiment, researchers added a dynamically moving robot. This one moved side to side along a black line. The goal of this test was to monitor how the robot reacted to a dynamic obstacle using the same pre-trained data as before, and with reinforced learning. The robot was able to approach the moving obstacle and adjust its trajectory to avoid any collisions. Without pre-training, the robot failed to react in time, colliding with the obstacle robot multiple times.
Experiment E. Faulty Sensor: In this experiment, one of the four sensors was made faulty by introducing random noise into the readings. They tested both with and without pre-training and found that with pre-training, the robot was able to compensate for the missing sensor and navigate using the remaining three. Without pre-training, the robot often moved in the wrong direction completely, gradually learning to ignore the incorrect sensor readings.
Experiment F. Faulty Motor: Lastly, the researchers performed a similar test to the one in experiment E, but this time instead of a faulty sensor, they reduced the power of one of the motors by 50%. With pre-training, the robot compensated for the reduced speed and altered its movements accordingly. Without pre-training, the robot struggled significantly, getting stuck against obstacles, unable to adjust its trajectory effectively.

The results from these experiments were clear. By introducing human intuition, the robot was able to dive straight into reinforcement-learning more effectively. Pre-training reduced the need for trial and error and allowed the robot to overcome malfunctions like sensor and motor failures. In real-world scenarios, this training method will allow autonomous vehicles to learn quickly and with minimal damage to their surroundings.

Unfortunately, there are also downsides to HITL training. Things like human bias, and scalability are still areas that require additional research. It’s one thing to add human intuition into a miniature robot, it’s another to control a vehicle or drone through a complex environment. Future work will hopefully enable robots to inherit human intentions and apply HITL to a wider range of tasks, catering for any errors as well as generalization in terms of obstacles. As AI continues to evolve, we will find new ways to refine machine learning processes so that reactions and decision-making become faster and more natural.

From personal experience I know how frustrating collision tests can be within an automation environment. Even assembly robots, which are packed into tight spaces, are susceptible to collisions with their neighboring robot. Papers like this are paving the way toward a world where instances like this are a thing of the past.

To have a look at the robot the researchers used and its movements through each experiment, download a copy of the paper.