A Scalable Real-Time SDN-Based MQTT Framework for Industrial Applications

Download the Audio (Right-click, Save-As)

Let's say you and I are Oompa Loompas in a chocolate factory. We have various jobs, and we do all the meaningful work, while the crazy guy in the hat gets all the credit. But that's fine, it's what we signed up for. Your job is to take caramel squares and dip them in chocolate. Then you hand them to me. I sprinkle a little salt on top and carefully wrap each chocolate-covered caramel in a cellophane wrapper, twist the ends, and then place that carefully in a box lined with tissue paper. You dip, I wrap.

Everything's fine except you can dip way faster than I can wrap. It takes you 2 seconds to dunk a caramel, and it takes me 10 seconds to sprinkle, wrap, and put it in the box. So what do we do? If you dunk as fast as you can, you're going to be sending me chocolates 5X faster than I can wrap them, and within a few minutes we're going to have an "I Love Lucy" situation where I'm stuffing chocolates in my pockets to keep up with your pace. Nobody wants that.

So we have at least two options. We can, for example:

Hire 4 more Oompa Loompas to work with me, doing the same job. So now we can collectively wrap them just as fast as you can dunk them. But for maximum efficiency when you wrap a chocolate, it needs to go to whichever Oompa Loompa has their hands free. So we need a mechanism, some kind of fancy conveyor belt that can do that.
Leave it as just me at the wrapping station, but instead of the chocolates coming right to me, they should get placed on a cooling rack. So now I don't need to worry about how fast you're placing them on the cooling rack; I'll just take one rack at a time, pull one chocolate off, and take my time wrapping it, then move on to the next one. This makes sense in a lot of cases because let's say you only dunk caramels for the one hour a day after the caramel cooking process finishes. So during that hour you're dunking like a madman, but in reality, I don't need to keep up with you during that moment. In the course of my 8-hour shift, I'll be able to more than cover the caramels you dunked during your hour.

In these two examples: the fancy conveyor belt, and the cooling racks. These are both examples of queues. And you and I, the Oompa Loompas, are components or "workers" in a distributed system (a factory in this case). The queues in our factory, these special pieces of equipment that allow us to buffer the work that needs to be done between different components, are the core of all distributed systems. Whether those distributed systems are web applications, car assembly lines, or a long-running mapreduce functions, odds are: there is some kind of queuing mechanism at their core. Queues are the fundamental data structure, the skeleton that makes giant unwieldy, unpredictable systems possible. In my opinion, this was best articulated in 2014 in the Reactive Manifesto. If you've never read it, check out reactivemanifesto.org it only takes a couple of minutes to read the whole thing.

There are a number of commercial queuing products on the market. AWS SQS is probably the biggest name there; Google Cloud also offers a pub/sub that's very popular. But those kinds of cloud service offerings aren't always the right form factor for the system you're building.

By the way, don't get hung up on the difference between a "pub/sub" and a "queue". There are certainly differences but for the purpose of this episode we're treating them as one family of products and using the terms interchangeably.

Anyway, sometimes it makes sense to use the cloud queues, sometimes it makes sense to operate your own queue. Many of the open-source options aren't just applications, but they're actually a set of novel queuing protocols and the application that uses them. So if you use RabbitMQ, that's all going to be based on the AMQP protocol. If you're going to use ZeroMQ, it's based on ZMTP.

Side note about ZeroMQ. If you want to go down a rabbit hole on queuing, there is a YouTube video from a decade ago called "Pieter Hintjens - Distribution, Scale and Flexibility with ZeroMQ" that only has, I think, 12k total views. It is in my top 5 best tech talks I've ever seen. If you're in management or leadership in any way and you've never heard of Conway's Law, stop listening to this episode right now and go watch that instead. RIP to Pieter.

I digress. In addition to the AMQP protocol and the ZMTP protocol, there's a protocol called MQTT: the Message Queuing Telemetry Transport protocol. Like some of the others, it's an application-layer protocol but it's built on top of TCP. So if you think of the OSI model, it lives in the same layer as things like HTTP, FTP, and SSH. MQTT has been around since the late 90s; it's big and open, it has an ISO standard, and a governing body and all that. It's not like a small 1-person-in-a-garage kind of thing, it's a big legitimate protocol, and changes to that protocol take time, and there's a process and a bureaucracy to navigate. MQTT doesn't change on a whim.

So that brings us to today's paper, which, by my account, is at least the 6th paper by these authors about this topic. They have been, since, I believe 2021, slowly building the case in a series of articles that MQTT isn't currently suitable for industrial applications, and they have a proposal for how exactly we can overcome that.

In previous work, they defined the problem space, mapped out possible options, and architected a solution, and this paper is actually just enhancing the previous solution and then running a bunch of advanced benchmarking and analysis on it to prove that it does what they claim it can do. Based on some of the hints they dropped during the paper, I'm going to guess that they have at least 5-6 more papers to publish on this, that will probably take them several more years, so this one comes right in the middle of this giant decade-long case that they're making. Since we've never talked about their research at all, I figured now would be a good time to catch up on the problem they defined, the solution they've architected so far, and then give a preview of where this research is going over the next couple of years.

The problem with MQTT:

Think of MQTT as offering 3 types of message delivery:

Unicast: One sender, one recipient
Multicast: One sender, a sub-group of recipients
Broadcast: One sender, all recipients. Technically, MQTT doesn't do broadcasts natively, but in practice, that can be implemented by the user.

Within those services, they offer different QoS (quality of service) guarantees, named QoS 0, 1, and 2:

QoS 0: Send once. This is like me dropping a postcard in a mailbox. I can guarantee that I sent it. Whether or not anyone ever receives it is really their problem. In some cases, this is all you need. Maybe the thing going into the queue is an alert notification. And the user only needs to see it if they're currently in the app and have strong WiFi, and if they miss the alert, it doesn't really matter.
QoS 1: Deliver at least once. This is like me sending a letter that requires a signature. If they try to deliver it and nobody's home, I might just send another copy, and another, and another, and now I have multiple mail carriers all coming to your door with this letter. And when you finally do answer the door, they might hand you a stack of a hundred of them. That's fine, as long as I know you got at least one of them, and signed it, that's all I care about. Sometimes a system will need to perform idempotent operations where it really doesn't matter if it's invoked twice, as long as it's invoked at least once-that's all we care about.
QoS 2: Deliver exactly once. This is like I'm sending you a package of something expensive you ordered from my store. You have to get it, and you have to confirm that you got it, but you only ordered one, so under no circumstances should you get the package twice. Any action in a distributed system that is mission-critical and not idempotent would probably fall under this use case. That being said, deliver-exactly-once has significant overhead. There's a 4-way handshake involved, and just a lot of pieces that need to be synced up. But let's say your system is sending money from one person to another. If that needs to happen once and only once, and the ramifications of it happening zero times or more than 1 time are disastrous, then the overhead of QoS 2 is worth it.

The issue with these guarantees (QoS 0,1,2) is not about what they offer, it's about what they don't. None of these guarantees offer anything concerning timeliness. The guarantees say what is going to happen, but not when it's going to happen. There's no way for me to say, for example: "I need at-least-once delivery within this specified time frame". Why does this matter? Well, let's go back to the chocolate factory. Let's say the caramels you're dipping right now are a special order for Violet's birthday, tomorrow. They're special caramels, they've got her name stamped right into the chocolate coating. That order needs to be boxed and packaged to go out by this afternoon. You're loading up these cooling racks with chocolates, there's no way for you to communicate to me that not only should I prioritize these, but there is a specific deadline I need to hit.

These lack of temporal / timeliness / "predictable execution" guarantees, the authors argue, make MQTT unsuitable for industrial environments. And that should make some kind of intuitive sense. If you're running a factory, you need to be able to delegate things to different priority levels and attach specific timelines to their delivery and execution. Without that, your job is a lot harder.

What the authors propose:

Starting in MQTT version 5, which was released in 2019, developers can add what's called "user properties" to a message. Instead of the message having a fixed header and a payload, there's a fixed header, a payload, and what's called a variable-payload where you can add key-value pairs of data. So in their proposal, they're adding a series of key-value pairs to the variable payload:

Flow Index (i): This identifies the specific data flow. It helps in distinguishing multiple flows of messages in the system.
Transmission Time (Ci): Specifies the time it takes to transmit a message, including any overhead.
Priority (Pi): Indicates the priority level of the message.
Period (Ti): Defines the minimum inter-arrival time or period between two successive messages published by the sender.
Deadline (Di): Specifies the maximum allowable time for the message to be delivered. It must be less than or equal to the period.
Bandwidth Usage (BWi): Defines the maximum bandwidth that the message can use.

But all of those new key-value pairs in the user properties don't magically do anything by themselves, right? It's like, if I drop a postcard in the mail and write on it "Hey USPS, deliver this by Thursday", that's not going to have any effect on anything at all. Conveying my timeliness desire doesn't matter if there's not also the infrastructure in place to make good on my request. With USPS, that means buying the class of post that guarantees delivery at a certain date. With MQTT, we need some kind of system that can look at the requests in the header and make good on them. That is where SDN comes in.

Very briefly, SDN (software-defined networking) is basically an application built into switches, routers, and other internet infrastructure that can route packets based on logic instead of just analog routing. We covered SDN in more detail in the episode on October 9th titled "Traffic Classification in Software-Defined Networking Using Genetic Programming Tools." I'd recommend reviewing that episode for more details. And another sidenote: This is a good time to mention, as of yesterday afternoon, the episode archive is up at JournalClub.io/archive. So if there are any episodes from the past that you missed and you want access to, like the SDN episode I just mentioned, go to the archive, find the episode name, and then send me a note with the episode name so I can re-trigger the email send for you. At some point, re-sends will be fully automated, but, baby steps.

Anyway, back to the story. When MQTT is running, and its publishers are sending unicasts, multicasts, and broadcasts through the MQTT broker and out to the subscribers, the packets involved may be getting routed through SDN (if you've set up that kind of system). So in the authors' proposal, the SDN looks at the headers, sees the timeliness requests, and then performs the routing in a way that maximizes the chances of those timeliness requests being honored.

And that's basically it, in a nutshell. The system they defined is called RT-MQTT (Real-Time MQTT) and MRT-MQTT (Multicast Real-Time MQTT), which for all intents and purposes is a superset of RT-MQTT with extra logic for those use cases.

Most of the paper is spent defining the architecture, governing logic and algorithms for how the SDNs would operate, but then the authors spend a considerable amount of time simulating and analyzing the performance of their new system. They set up emulations in Mininet, which is a network emulator, and used the Ryu OpenFlow controller to simulate some of the lower-level aspects of SDN. They ran the system under low, medium, and high loads and focused their analysis on WCRT.

WCRT stands for Worst-Case Response Time. If you're familiar with Big O notation for algorithms, it's a lot like that, but for a distributed system, in the way that they both represent upper bounds. In order to produce WCRT, they first calculated HA and TA. HA (Holistic Approach) focuses on the cumulative delays across all nodes and switches on the path, while TA (Trajectory Approach) calculates the latest time a message could start at its final destination by working backward through the network path. HA is more pessimistic, but TA requires more computational resources and complexity to implement. So they considered both. Here are the broad strokes of their results:

MRT-MQTT outperforms MQTT: The system reduced Worst-Case Response Time (WCRT) significantly for time-sensitive traffic in multi-edge networks.
Traffic prioritization enhances timing guarantees: Prioritized messages exhibited lower WCRT compared to lower-priority messages, showing that the framework effectively manages different priority levels.
Schedulability analysis validated: Both Holistic Approach (HA) and Trajectory Approach (TA) provided accurate WCRT estimates that closely matched the experimentally observed results, demonstrating the reliability of the analysis methods.
Trade-off between accuracy and complexity: HA provided faster but more pessimistic results, while TA gave tighter, more precise WCRT bounds but at the cost of increased computational complexity.

The road ahead:

The results so far are promising, but the authors aren't done yet. Over the next few years, we can expect more papers from them that will go further and deeper on this. Future research is likely to focus on improving scalability across even larger multi-edge networks, refining the precision of the real-time schedulability analyses, and exploring advanced techniques for handling mixed traffic loads with minimal impact on non-real-time data. They may also investigate integrating more robust security mechanisms that don't compromise the timeliness guarantees.

We'll be keeping an eye on them as they continue to publish. And when there's a big announcement, you'll hear about it here on Journal Club.

If you'd like to view the architecture diagrams they created for this system, or the formulas that govern its execution, please do download the paper. If you'd like to read the previous papers this team authored on this subject, they're listed in the reference section, as reference numbers 12-16.