ChatMPC: a language-driven model predictive control framework for adaptive and personalized autonomous driving

Download the Audio (Right-click, Save-As)

Let's say you're driving down the highway...sort of. You're in the drivers' seat, and your hands are somewhere near the wheel. But in reality, your car is doing most of the actual "driving" here. It's semi-autonomous, so it's capable of steering itself, maintaining its speed, following bends in the road, and keeping a safe distance from the car in front of it.

What it can't seem to do, is take instruction.

If you want the system to behave differently (pass that truck ahead, follow more cautiously, drive more aggressively etc), you're going to have to take over manual control.

But what if you didn't? What if you could get the car to modify its "driving" style, or take specific actions on the road in response to a verbal queue. What if you could modify how the car steers, how it navigates, the speeds it chooses to drive at, the safety tradeoffs it makes, and even the objective function of its controller...all with just a few commands.

Most autonomous driving systems today don't work like that. They rely on static objectives and fixed control strategies. Rule-based decision trees, threshold controllers, logical reasoning systems. These approaches work fine in simple, predictable environments. But in the real world, you might want to be aggressive one minute and cautious the next. Your needs aren't static, and they're not the same as the next person's. And that's where these systems fall short. They can't adapt. They can't personalize.

In this paper, the authors are proposing a fix for that. They call it ChatMPC. It's a system that integrates natural language processing with what's called Model Predictive Control to enable adaptive, personalized autonomous driving. Their framework uses a transformer-based sentence embedding model to parse driving intents from natural language commands, then dynamically updates the MPC controller's objective functions and constraints to generate personalized driving behaviors aligned with user preferences.

On today's episode we'll walk through how they designed this system, and how it works. How it parses natural language into actionable control parameters, how it runs on top of a kinematic vehicle model, and how it reconfigures MPC objectives in real time. Let's dive in.

Model Predictive Control is one of the most widely used approaches in autonomous driving. It's mathematically rigorous, it plans trajectories over finite time horizons, and it can handle constraints like speed limits and safety distances. Here's how it works. At each control step, the system solves an optimization problem over a prediction horizon, executes the first control action, then shifts the horizon forward and repeats. This rolling optimization allows the controller to continuously adapt to changing conditions while respecting physical and safety constraints.

But traditional MPC systems have a limitation: their configuration is static. The objective functions are predefined. The constraints are fixed. The parameters don't change based on what you want right now. So when you're stuck behind a slow-moving truck and you want to overtake, the system can't just shift gears and become more aggressive. When you've got a baby in the backseat and you want the car to maintain a longer-than-normal following distance and avoid any sudden movements, it can't just dial down the responsiveness. These things are locked in at design time.

In this paper they're proposing a solution. They use a pre-trained language model to parse driver intent from natural language. Then they reconfigure the MPC controller (on the fly) based on that intent. The language model handles the semantic understanding, the MPC handles the safety and control execution. It's a hybrid architecture, with two main modules: an Intent Recognition Module and an MPC Controller Module. You say something, the intent module figures out what you want, and the controller module reconfigures itself to do it. Let's look a little deeper.

The Intent Recognition Module takes a natural language command from the driver and translates it into a structured, machine-readable intent. It does this in two steps.

First, a Language Encoder uses Sentence-BERT to generate a high-dimensional embedding vector that captures the semantic features of the command. Sentence-BERT is a modification of BERT that uses dual and triplet network structures to derive semantically meaningful sentence embeddings. The key difference from vanilla BERT is that Sentence-BERT applies a mean pooling operation over the output embeddings from the final layer, averaging the token vectors to create a single vector that represents the entire sentence. So whether you say "I want to overtake" or "Let's pass this car" or "Can we go around," the embedding captures the underlying intent in the same semantic space. The vector encodes not just individual word meanings but the contextual relationship between words, allowing the system to understand that "overtake the car ahead" and "can we get past the car in front" are semantically equivalent despite different word choices.

Second, a Task Extractor uses that embedding to classify the command into a target form. This is where K-Nearest Neighbors comes in. The authors built a training set of commands across three primary intent categories: car-following, overtaking, and maintaining distance. Each training command was encoded into a vector using the same Sentence-BERT model. These vectors form the intent vector library. When a new command comes in, the system encodes it, then calculates the distance between this new vector and all vectors in the library. The nearest neighbors are retrieved, and the most common label among those neighbors determines the classified intent. This approach ensures accurate intent mapping without rigid templates or grammatical rules. The classifier doesn't need to match exact phrases, it just needs to find semantically similar examples in the training set.

The structured output includes path objectives like target position and direction, control preferences like smoothness or speed priority, and safety constraints specific to the maneuver. For example, if the intent is "overtake," the system knows it needs to plan a lane change, relax lateral position constraints to allow the maneuver, and shift the target position to ahead of the lead vehicle.

Once the intent is structured, it gets passed to the MPC Controller Module. This is where the dynamic reconfiguration happens. The MPC optimization problem is defined by an objective function and a set of constraints, and both of these change based on the parsed intent.

For a car-following command, for example, the controller sets the longitudinal reference state to maintain a fixed safety distance behind the lead vehicle. The lateral reference is set to the current lane center. The control time step is set to a smaller value to enable rapid response to the lead vehicle's speed changes. And finally, lateral position constraints are tightened to enforce strict lane-keeping. This configuration prioritizes tracking accuracy and stability over maneuverability.

But by contrast, for an overtake command the configuration changes completely. The longitudinal reference shifts to a target position ahead of the lead vehicle. Lateral constraints are relaxed to permit the necessary lane-changing maneuvers. Then the prediction time step is increased to plan for the extended overtaking horizon. This configuration prioritizes maneuverability and progress over strict lane adherence, allowing the vehicle to execute the overtake efficiently while still respecting safety margins.

The objective function itself is defined as a weighted combination of terminal cost, state cost, and input cost. It minimizes the difference between where the car ends up and where you want it to be, while also penalizing excessive control effort to maintain ride comfort. In other words: the system wants to get you where you're going, but it also doesn't want to throw you around with harsh braking or sudden steering.

The weight matrices are tuned based on the driving intent. For aggressive overtaking, the weights might emphasize rapid position changes and tolerate higher acceleration and steering rates. For conservative car-following, the weights would emphasize smooth tracking and penalize sharp control inputs more heavily. This is how the same underlying controller can produce radically different driving behaviors based on what you asked for.

The system also integrates explicit safety constraints into the optimization problem. Across all driving modes, it enforces a hard safety distance constraint to prevent collisions. Each vehicle is modeled using two circles centered at the front and rear axle centers. The distance between any circle centers on the rear and lead vehicles must stay above a minimum threshold at all times. This multi-circle representation is more accurate than point-mass models because it accounts for the actual length and geometry of the vehicles. A point-mass model might allow configurations where the centers of mass are far apart but the vehicles are actually overlapping. The two-circle model prevents this by requiring separation between both the front and rear portions of each vehicle. It's a simple but effective way to ensure that safety constraints reflect physical reality.

The vehicle kinematics are modeled using a two-degree-of-freedom bicycle model. The name comes from the fact that the model treats the vehicle like a bicycle: the left and right tires on each axle are combined into a single virtual tire, and the model tracks the front and rear axle positions separately. This approach balances predictive accuracy with computational simplicity. And there are a number of other simplifying assumptions: the vehicle is modeled with a compact two-dimensional kinematic representation that ignores tire slip and rear steering, assumes idealized geometry-based turning, discretizes motion for MPC optimization, and treats the lead vehicle as a constant-velocity, straight-line reference to keep the focus on rear-vehicle control rather than traffic prediction.

To evaluate this system, the authors built a closed-loop simulation in MATLAB and tested the full pipeline end to end. They focused on two representative scenarios: car-following and overtaking. In each case, a natural-language command was issued, the intent was parsed, and the MPC controller was reconfigured accordingly. The evaluation measured whether the vehicle converged to the correct target behavior, how accurately it tracked the desired position relative to the lead vehicle, and whether the controller could run fast enough for real-time use.

So how did it do? Quite well! Across both scenarios, the system produced stable trajectories, respected safety constraints, and completed each control update well within the time budget.

So, what can we learn from this paper?

Well, the takeaway here is not about better lane changes or lower tracking error. It is about system design. By keeping language at the level of intent and letting a conventional controller handle execution, the authors carve out a narrow but practical role for natural language in safety-critical systems. It is a reminder that many useful applications of language models are not about replacing existing systems, but about making them more flexible, interpretable, and usable by the people who rely on them.

If you want to see the MPC parameter configurations, the data augmentation examples, the vehicle kinematics equations, or the formulation of the objective function and safety constraints, I'd highly recommend downloading the paper.