Today's article comes from IEEE Access. The authors are Naeem et al., from Aalborg University, in Denmark. In this paper they revisit a relatively obscure packet-loss-aversion protocol from 1997, and model it formally to see if it's worth implementing in modern streaming infrastructure. Let's dive in and see what they found.
DOI: 10.1109/ACCESS.2024.3389738
In late October 1997, Atlanta Georgia played host to a 4-day technology conference: the 1997 International Conference on Network Protocols. At that event, a paper was presented by a 3-person team from AT&T Research Laboratories, Lucent Technologies Bell Labs, and Fujitsu Laboratory of America. AT&T, or “Ma Bell,” had been broken up a decade before in antitrust proceedings, but these ostensibly now-independent entities still collaborated on research. The paper they presented had all the hallmarks of Bell Labs innovation, namely being decades ahead of its time. The paper was called “A cooperative packet recovery protocol for multicast video,” and in it, they outlined a system in which packet loss between a sender and a receiver could be mitigated by a 3rd party server that steps in to replace packets as they’re lost.
Like so many inventions from the early days of the web (and so many inventions from Bell Labs, especially), the innovation displayed wouldn’t be truly appreciated until decades later when the problem they were preemptively solving would become a true pain for modern network engineers.
Now that video streaming and live TV are commonplace, and packet loss is a consistent problem across the board, researchers from Denmark, Pakistan, and Finland are revisiting this paper from 27 years ago. They’re collaborating to formally specify the cooperative packet recovery protocol (in modern terms), model it mathematically, and put it to the test. How exactly would this protocol look if implemented today, and would it work? Let’s find out.
A packet is a unit of data transmitted over a network, consisting of payload and header information such as source and destination addresses. Packet loss occurs when one or more packets of data fail to reach their destination, often due to network congestion, faulty hardware, or signal interference. Packet loss directly affects Quality of Service (QoS) by degrading the reliability of data transmission, particularly for real-time applications like video streaming. Industry standards suggest that packet loss rates below 1% are acceptable for streaming video.
If you're familiar with TCP, you've undoubtedly heard of the 3-way handshake. To review: it’s a SYN from the client, a SYN-ACK from the server acknowledging the SYN and sending its own SYN, and then the ACK from the client finalizing the handshake and acknowledging the server's SYN-ACK. This paper revolves around another concept in this family, called NACK. A NACK is a Negative Acknowledgment. It is used to indicate that a packet was not received or was received with errors, prompting the sender to retransmit the missing or corrupted data. Unlike an ACK, which confirms successful receipt, a NACK signals a failure in transmission.
The packet recovery scheme involves four main players: the source, client, receiver, and server, each with distinct roles:
This structure ensures reliable packet delivery even when packets are lost during transmission.
In this paper (the 2024 paper), the authors wanted to model, simulate, and evaluate the packet recovery scheme. But they first had to define it. For that, they used UPPAAL. UPPAAL stands for Uppsala and Aalborg Universities’ toolset for modeling, simulation, and verification of real-time systems. Broadly, it is a toolset used to model systems as timed automata and then verify their behavior through simulation and formal checking. In this paper, they used UPPAAL to create formal models of the protocol's components, simulate their interactions, and verify whether the protocol met its functional requirements.
Once the entire system was modeled in UPPAAL, it was time to simulate it in MATLAB. UPPAAL’s primary strength lies in verification and simulation of real-time systems, and it is limited in analyzing the system’s performance under varied data loads and network conditions. To address this, the authors exported key findings from their UPPAAL models, such as timing constraints and packet recovery mechanisms, into MATLAB to simulate how the system performed under different scenarios.
In MATLAB, they defined performance parameters, including the packet loss rate, inter-packet delay (IPD), buffer sizes for both the client and server, and transmission data rate. These simulations focused on understanding how changes in network conditions would impact the packet recovery process, specifically the active part of the buffer (APB) during transmission.
Here's what they found:
So, what can we take away from this work?
Well, those two last points about maximum tolerable packet loss rate and network congestion are really the critical ones here. To reiterate what those results mean: This system, when put under the load of a simulation, was able to regain some lost packets. But only when things were mostly fine anyway. When packet-loss was severe enough that QoS would be effected (greater than 1% packet loss rate) the system failed to meaningfully improve the situation. In other words: if your stream is mostly fine, this system could make it a little better than it already is. But if your stream is messed up, if that Zoom call is freezing or your live football match starts jumping from frame to frame, this would not help you. In effect, it does work: but only at the times you don't really need it to work. If we're looking for a silver bullet for packet-loss, we're going to need to keep looking.
If you’d like to view the architecture diagrams of the system they create, or read the actual formulas they used to model it, I’d recommend that you download the paper.