Modeling and Analysis of Cooperative Packet Recovery Protocol

0:00 0:00

Download the Audio (Right-click, Save-As)

In late October 1997, Atlanta Georgia played host to a 4-day technology conference: the 1997 International Conference on Network Protocols. At that event, a paper was presented by a 3-person team from AT&T Research Laboratories, Lucent Technologies Bell Labs, and Fujitsu Laboratory of America. AT&T, or “Ma Bell,” had been broken up a decade before in antitrust proceedings, but these ostensibly now-independent entities still collaborated on research. The paper they presented had all the hallmarks of Bell Labs innovation, namely being decades ahead of its time. The paper was called “A cooperative packet recovery protocol for multicast video,” and in it, they outlined a system in which packet loss between a sender and a receiver could be mitigated by a 3rd party server that steps in to replace packets as they’re lost.

Like so many inventions from the early days of the web (and so many inventions from Bell Labs, especially), the innovation displayed wouldn’t be truly appreciated until decades later when the problem they were preemptively solving would become a true pain for modern network engineers.

Now that video streaming and live TV are commonplace, and packet loss is a consistent problem across the board, researchers from Denmark, Pakistan, and Finland are revisiting this paper from 27 years ago. They’re collaborating to formally specify the cooperative packet recovery protocol (in modern terms), model it mathematically, and put it to the test. How exactly would this protocol look if implemented today, and would it work? Let’s find out.

A packet is a unit of data transmitted over a network, consisting of payload and header information such as source and destination addresses. Packet loss occurs when one or more packets of data fail to reach their destination, often due to network congestion, faulty hardware, or signal interference. Packet loss directly affects Quality of Service (QoS) by degrading the reliability of data transmission, particularly for real-time applications like video streaming. Industry standards suggest that packet loss rates below 1% are acceptable for streaming video.

If you're familiar with TCP, you've undoubtedly heard of the 3-way handshake. To review: it’s a SYN from the client, a SYN-ACK from the server acknowledging the SYN and sending its own SYN, and then the ACK from the client finalizing the handshake and acknowledging the server's SYN-ACK. This paper revolves around another concept in this family, called NACK. A NACK is a Negative Acknowledgment. It is used to indicate that a packet was not received or was received with errors, prompting the sender to retransmit the missing or corrupted data. Unlike an ACK, which confirms successful receipt, a NACK signals a failure in transmission.

The packet recovery scheme involves four main players: the source, client, receiver, and server, each with distinct roles:

The source is responsible for broadcasting data packets to both the server and client. It continuously sends packets in sequential order.
The client receives these packets and stores them in its buffer. If it detects a missing packet (based on sequence numbers), the client sends a NACK (Negative Acknowledgment) to the server to request the missing data.
The server also receives the same broadcast from the source and stores all packets in its buffer. When it receives a NACK from the client, it retrieves the requested packet from its buffer and retransmits it to the client.
The receiver is the end-user component, relying on the client to forward the fully reconstructed data stream. The client only forwards data to the receivers once it has ensured that all packets, including any retransmitted ones, are in order.

This structure ensures reliable packet delivery even when packets are lost during transmission.

In this paper (the 2024 paper), the authors wanted to model, simulate, and evaluate the packet recovery scheme. But they first had to define it. For that, they used UPPAAL. UPPAAL stands for Uppsala and Aalborg Universities’ toolset for modeling, simulation, and verification of real-time systems. Broadly, it is a toolset used to model systems as timed automata and then verify their behavior through simulation and formal checking. In this paper, they used UPPAAL to create formal models of the protocol's components, simulate their interactions, and verify whether the protocol met its functional requirements.

The source specification consisted of creating a model of the source process where it sends packets via a broadcast channel. They used UPPAAL’s graphical interface to draw automata, with states representing packet transmission phases and transitions triggered by conditions like buffer capacity and packet sequence. Code was used to define conditions that control packet transmission, such as a guard condition to stop sending when the client’s buffer is full.
The client specification consisted of modeling the client’s reception of packets, identification of missing packets, and sending of NACK requests. This involved defining the automaton states for receiving packets, sending NACKs, and transmitting packets to receivers. Functions were written in UPPAAL’s modeling language to handle actions like updating buffers, sending NACKs, and processing received packets. The transitions between these states were triggered by packet reception and the detection of missing sequence numbers.
The server specification consisted of drawing an automaton to model how the server responds to NACKs by retrieving and sending lost packets to the client. They specified conditions for when the server receives a NACK and added functions to simulate the storage and retrieval of packets from the server’s buffer. Transitions between states occur when NACK requests arrive and are processed.
The receiver specification consisted of a simpler model in which the receiver process is defined as receiving packets from the client. The automaton had states representing waiting for and processing incoming packets, ensuring that packets were forwarded in sequence. The interactions between the client and receiver were modeled through synchronization channels in UPPAAL, ensuring that packet flows remained consistent.

Once the entire system was modeled in UPPAAL, it was time to simulate it in MATLAB. UPPAAL’s primary strength lies in verification and simulation of real-time systems, and it is limited in analyzing the system’s performance under varied data loads and network conditions. To address this, the authors exported key findings from their UPPAAL models, such as timing constraints and packet recovery mechanisms, into MATLAB to simulate how the system performed under different scenarios.

In MATLAB, they defined performance parameters, including the packet loss rate, inter-packet delay (IPD), buffer sizes for both the client and server, and transmission data rate. These simulations focused on understanding how changes in network conditions would impact the packet recovery process, specifically the active part of the buffer (APB) during transmission.

Here's what they found:

Effect of IPD on APB: They found that the size of the active part of the buffer (APB) decreases continuously over time due to inter-packet delays, particularly when data transmission (DRTx) and receiving (DRRx) rates are not synchronized. The buffer size needs to be properly managed to avoid packet overwriting or loss during recovery.
Buffer Size and Data Rate Optimization: The buffer size and data rate needed careful tuning to maintain effective packet recovery. If the buffer was too small relative to the data rate, packets were lost before they could be recovered.
Maximum Tolerable Packet Loss Rate: The simulations showed that the system could tolerate a packet loss rate of less than 1% while maintaining acceptable Quality of Service (QoS). Beyond this threshold, the system struggled to recover packets in time, leading to degraded performance, especially in real-time video transmission scenarios.
Impact of Network Congestion: Simulating network congestion revealed that the packet recovery protocol was robust under moderate levels of congestion, but as congestion increased, recovery times also increased, leading to higher latency and the potential for dropped packets in the buffer before recovery.

So, what can we take away from this work?

Well, those two last points about maximum tolerable packet loss rate and network congestion are really the critical ones here. To reiterate what those results mean: This system, when put under the load of a simulation, was able to regain some lost packets. But only when things were mostly fine anyway. When packet-loss was severe enough that QoS would be effected (greater than 1% packet loss rate) the system failed to meaningfully improve the situation. In other words: if your stream is mostly fine, this system could make it a little better than it already is. But if your stream is messed up, if that Zoom call is freezing or your live football match starts jumping from frame to frame, this would not help you. In effect, it does work: but only at the times you don't really need it to work. If we're looking for a silver bullet for packet-loss, we're going to need to keep looking.

If you’d like to view the architecture diagrams of the system they create, or read the actual formulas they used to model it, I’d recommend that you download the paper.