Today's article comes from the journal of Biomimetic Intelligence and Robotics. The authors are Oriyama et al., from Waseda University, in Japan. In this paper they showcase a new Human In the Loop framework aimed at improving collision-avoidance in autonomous robots. Today's episode is hosted by Brett Beckermann.
DOI: 10.1016/j.birob.2025.100215
You might call it common sense. Or gut-instinct. Or a hunch. But what is it? What is intuition? And what would it mean for AI to possess it?
In today’s paper, researchers explore that idea, with a new technique based on RLHF: Reinforcement Learning with Human Feedback. They argue that they’ve developed a better way to transfer knowledge to a model during feedback rounds, creating something akin to intuition. We’ll start by exploring how neural networks are used in autonomous driving, how they help with collision detection, and how human feedback is used to reduce the need for trial-and error. Then we’ll walk through these authors’ contribution: a new framework that can harness human expertise and transfer it to a model more successfully. Before we go any further, let’s break down neural networks and how they’re trained.
Neural networks are essentially pattern recognition systems that learn from data. Training them involves feeding them information, allowing them to make predictions, and then comparing those predictions to the correct answer. The network then adjusts its internal settings known as weights, using a method called backpropagation. This helps AI models improve over time by reducing errors.
To put this into perspective, imagine a small bumper car equipped with sensors that detect collisions. It can move forwards, backwards, left, and right. To build a picture of its surroundings, it starts moving randomly, bumping into the edges of the track. The more it moves, the more it learns the boundaries, and the smarter it becomes. Through trial and error, it gathers enough data to navigate the track while avoiding collisions.
Now, a closed-off track is perfect for this type of training. With predictable barriers and clear pathways, the neural network can quickly improve. But if you take that same bumper car and place it in a busy city center, things get much more complicated. Suddenly, it has to deal with pedestrians, other vehicles, changing light conditions, environmental shifts, and inconsistent barriers. Now, the neural network has to work even harder to make real-time decisions.
In these complex, real-world environments, gathering enough data is extremely time-consuming, and training on the fly presents safety risks. Imagine training a self-driving car in a busy city for the first time, one collision is already one too many.
Researchers have been working on this problem for years. How can we speed up training models without relying solely on slow, trial-and-error learning?
For a long time, pure reinforcement training has been the go-to method. It works by rewarding correct actions and penalizing incorrect ones, encouraging the model to maximize positive outcomes over time.
More recently, Reinforcement Learning from Human Feedback has helped improve training time and alignment with human preferences, predictions, and judgment. Chatbots like ChatGPT, Claude, and Gemini have used this method to refine their conversational abilities over time. Early chatbot responses were factual but often rude or confusing. In the world of autonomous vehicles, self-driving cars also require a level of knowledge that allows them to overcome small unpredictable obstacles that were never incorporated into the training dataset. By incorporating human feedback, AI models could become more natural and intuitive.
In most RLHF systems, you’d train your model first and then provide feedback to it after it’s already up and running. The authors did something different: they built a system that could provide a form of “pre-feedback” to the model well before training had taken place. So rather than starting it’s training process from a blank slate, they were seeding the model with what they called “intuition” instead. This might be a bit counterintuitive, so let’s walk through their process in detail:
By injecting intuition early on, the HITL model makes robots more efficient, adaptable, and better equipped to handle dynamic environments, unexpected obstacles, and even hardware malfunctions.
For their experiment, researchers used a small Arduino-based robot called Zumo. These robots are designed for movement tracking and can be customized for different tasks. In this case, the Arduino board was equipped with: Four ultrasonic sensors (front, back, left, and right) to measure distances between 2 cm and 400 cm, and bluetooth connectivity, allowing remote monitoring from a PC. The motors’ speed was controlled using Pulse Width Modulation (PWM) with each side's motor being controlled independently. The setup allowed the robot to maneuver easily within the test environment.
The team then conducted six experiments inside a 150 cm x 150 cm enclosure, each designed to test different aspects of training.
The results from these experiments were clear. By introducing human intuition, the robot was able to dive straight into reinforcement-learning more effectively. Pre-training reduced the need for trial and error and allowed the robot to overcome malfunctions like sensor and motor failures. In real-world scenarios, this training method will allow autonomous vehicles to learn quickly and with minimal damage to their surroundings.
Unfortunately, there are also downsides to HITL training. Things like human bias, and scalability are still areas that require additional research. It’s one thing to add human intuition into a miniature robot, it’s another to control a vehicle or drone through a complex environment. Future work will hopefully enable robots to inherit human intentions and apply HITL to a wider range of tasks, catering for any errors as well as generalization in terms of obstacles. As AI continues to evolve, we will find new ways to refine machine learning processes so that reactions and decision-making become faster and more natural.
From personal experience I know how frustrating collision tests can be within an automation environment. Even assembly robots, which are packed into tight spaces, are susceptible to collisions with their neighboring robot. Papers like this are paving the way toward a world where instances like this are a thing of the past.
To have a look at the robot the researchers used and its movements through each experiment, download a copy of the paper.