Robotic Paper Wrapping by Learning Force Control

Download the Audio (Right-click, Save-As)

Try to think back to the last time you wrapped a birthday present. What did you do, and how did you do it? You probably laid the wrapping paper out, put the present on top, cut the paper to length, and pulled an edge of the paper up and over the box so it was fully covered. Then maybe you folded in the edges to create a nice line, then pressed the corners into triangles or trapezoids, folded them in and got to work with the scotch tape. In each step: from lifting the box, to cutting the paper, to manipulating the paper, to pressing and taping it, your hands knew exactly how much pressure to apply at each moment. When to pull the paper taut, when to let it relax, how to move it without creating a crease or bunch-up in the wrong place, and how to navigate around corners without tearing anything. You probably did it all without thinking much about it, but that doesn't mean it isn't an impressive feat. It's a delicate dance of force and finesse that feels effortless but actually requires incredible coordination between your visual system, motor control, and tactile feedback.

Now imagine trying to teach a robot to do the same thing. Every step, every move, every subtle nuance of the procedure. Suddenly, what seemed simple becomes a nightmare of edge cases. Apply too much force and the paper tears. Too little, and you get wrinkles. Move too fast and the paper shifts out of position. The robot will need to seamlessly transition between different types of control, sometimes prioritizing precise positioning, other times focusing on force application, all while adapting to different materials and box sizes.

This is a challenge that many automation companies have simply been avoiding. While robots have conquered many rigid manufacturing tasks, they've largely steered clear of delicate manipulation involving flexible or fragile materials. But avoiding these problems doesn't make them go away, and the potential benefits of solving them are significant. So that's what they're trying to do in today's paper. The authors have developed a robot that can successfully wrap rectangular objects using standard wrapping paper. It might not sound impressive at all, I know. But trust me, it really is. On today's episode we're going to walk through how they did it. Let's jump in.

The core challenge here isn't just automation, it's understanding how to break down the task into its component parts, and teach a machine to coordinate between them. The wrapping process is really three phases of tasks:

4 x folding tasks. Bending paper around the box's edges while maintaining tension.
8 x creasing tasks against the box surface. That is: pressing paper against the box to form sharp creases.
4 x free-space creasing tasks. This involves forming mountain folds in free space without any underlying support structure.

What makes this particularly complex is that each phase demands a different balance between position control and force control. During folding, the robot needs precise force control parallel to the paper surface to maintain tension, while using position control perpendicular to the surface to follow the correct trajectory. During creasing against the box, this relationship flips! Position control becomes critical in the tangential direction while force control dominates in the normal direction to create proper creases. For free-space creasing, position control becomes paramount since there's no surface to provide force feedback.

In their new system, the authors' separate these concerns into two main components. The imitation learning component handles trajectory generation and phase recognition, while the reinforcement learning component optimizes the force control parameters. This division of labor allows each learning method to focus on what it does best, rather than trying to solve everything simultaneously. Let's look a little deeper.

The imitation learning system consists of two specialized neural networks working in tandem. The phase estimation network takes the robot's current pose and direction vector as inputs and determines which of the three sub-tasks should be active. Think of this as the robot's situational awareness system, it's constantly asking "what should I be doing right now based on where I am and where I'm moving?"

Their network architecture uses what's called a multi-input design, where different types of information get processed through separate pathways before being combined. The robot's pose data (which includes both position and orientation) gets processed through one pathway, while the direction vector gets processed through another. These separate streams then merge together to make the final phase decision. This design helps the network understand both where the robot is in space and what kind of movement it's performing. The skill policy network generates intermediate target positions based on the current pose, target pose, and geometric context about the intended motion. Rather than trying to plan entire trajectories at once, this network just focuses on the next reasonable waypoint to aim for. It's like having a GPS that gives you turn-by-turn directions rather than showing you the entire route at once. Meanwhile, the direction label component encodes the relationship between the movement direction and reference directions using cosine similarity. This helps the network understand not just where the robot is, but what kind of movement pattern it should be executing.

So that's how the robot navigates through space, but how does it handle the touch part? How does it apply just the right amount of pressure at the right time? This is where the reinforcement learning component comes in. It handles all the force control parameters. It does this through a Soft Actor-Critic agent. Soft Actor-Critic is an algorithm that's particularly good at learning continuous control policies. This means it can output smooth, precise adjustments rather than just discrete on-off commands. The agent learns to output position gains, force gains, and target forces that get fed into a hybrid position-force controller. The system calculates the desired force direction by finding the intersection of two planes. The first plane is perpendicular to the robot's velocity vector, and the second plane contains the robot's initial position, current position, and previous position. This geometric approach ensures that forces align with the robot's natural movement patterns rather than being applied in arbitrary directions. This is much more sophisticated than just applying force in fixed coordinate directions. By relating force direction to the robot's motion, the system can adapt to different wrapping scenarios while maintaining physically realistic force profiles.

Instead of trying to control position and force simultaneously in all directions, it uses a selection matrix to determine which type of control applies along each axis. When a diagonal element in the selection matrix equals one, position control dominates in that direction. When it equals zero, force control takes over. The author set all diagonal elements to half, creating a balanced hybrid approach rather than switching between pure position and pure force control. This creates a system that's always considering both position and force, but can weight them differently based on what the task requires.

So...did it work? Was this robot actually able to wrap a box? Or not?

They tested it first in simulation, and then in the real world. The experiments were run on a UR-3e arm (from Universal Robotics) with a force/torque sensor and a gripper. In the end, it looks like they did it. Overall task success was 95%, and wrinkling and tearing each happened about 5% of the time. Honestly that's probably better than I can do with my actual hands, I tear the wrapping paper on accident all the time. Their system generalized well across different paper types and different thicknesses, and was able to automatically adapt its force levels to different box dimensions. Ablation studies confirmed that both of their system's main components were important, using either method alone caused success rates to plummet and failure modes to spike

So what can we learn from this?

Well, this paper demonstrates that manipulation of deformable materials is achievable through problem decomposition. By breaking the challenge into smaller component chunks, the authors were able to build a system that turns a complex, fragile task into one that robots can perform reliably and repeatably. And this success is applicable far beyond wrapping paper. These kinds of force control strategies could be useful in food preparation, textile handling, medical applications, and more.

If you want to see the authors' experimental protocols, explore their network architectures, or examine their training procedures, I'd highly recommend downloading the paper. They provide technical details, experimental results, and implementation strategies that we couldn't cover today.