A Polynomial-Time Algorithm for Detection of Uncovered Transitions in a Petri Net-Based Concurrent System

0:00 0:00

Download the Audio (Right-click, Save-As)

Today’s paper is about an algorithm that is unusually efficient at a relatively-obscure task. The authors have found a way to surface uncovered Petri net transitions in cubic polynomial time. Now, those words might not mean anything to you yet. So we’re going to put that whole idea off to the side, and we’re going to come back to it in a few minutes. Forget about it for now, and instead I want to you ask you a question:

How do you optimize a distributed system with thousands of moving parts and dozens of different goals? Short answer: you model it first. If you don’t model it, then any optimization you try to make is just shooting in the dark. Without a model, you’ll have no way of knowing if the modifications you’re making are:

…working as intended
…or getting muted by some other part of the system
…or even causing problems for some other component

Okay, so how do you model it then? There are a few options. The one we’re looking at today is called a Petri net. Not a Petri dish, that’s a totally different thing named after a different guy. Petri nets are named after Carl Adam Petri. He was a doctoral student in the early 60’s and he laid the mathematical foundations for Petri nets in his dissertation. Sixty years later, they’re an important tool in Systems Theory, and they’re used to model all kinds of complex interconnected processes, workflows, networks, interactions, operations, and activities. Here’s a few examples of how they’re used:

Manufacturing Processes: Petri nets track how machines and resources interact, to assemble products efficiently.
Traffic Control: They model how traffic lights coordinate to keep vehicles and pedestrians moving safely.
Computer Networks: They show how data packets are transmitted and received by all the switches and routers and nodes in a network.
Healthcare Workflows: They model tasks like patient-admissions, treatments, surgeries and exams, and all the resources, equipment, rooms, and staff that interact with them.

…and there are many other examples.

So what is a Petri net exactly? It’s a mathematical model of a distributed system, but it’s commonly represented visually as a diagram. These diagrams are distilled compared to the actual mathematical models (they don’t have quite as much information in them), but they do a good job of visually conveying an entire system at once. They look a bit like flow-charts, or UML diagrams. In a Petri net visualization, you’ll generally see three types of shapes:

Circles (Places): Represent states, conditions, or resources in the system.
Rectangles (Transitions): Represent events or actions that change the state.
Arrows (Arcs): Indicate the flow or relationship between places and transitions, showing how states and actions are connected.

Let’s take a deeper look at the rectangles, the “transitions” (the actions). At any given time, a transition can generally be either enabled (ready to fire), disabled (not able to fire), or fired (executed). But a transition can also be "uncovered." This means the transition is not part of any t-invariant. In other words, it’s not included in any cyclic behavior or functional loop within the system. An uncovered transition is bad. The event or action it represents will never occur because it’s not actually connected to the rest of the system in any functional way. Its very existence indicates wasted design effort, or even a flaw in the system’s logic. Remember there are resources or processes tied to that transition, but they're effectively useless when the transition is uncovered. If you can find uncovered transitions, they are likely the lowest hanging fruit that you can fix in order to begin optimizing the system as a whole.

Accordingly, there is a whole body of research that concerns finding and surfacing uncovered transitions. It’s not that they’re hard to recognize when you see them, it’s that (in a sufficiently large and complex system) they’re hard to locate. If you try to brute force it, and have your algorithm exhaustively explore the entire state space of the Petri net, it’s going to take too long. It would need to enumerate all possible markings and transitions, and cycle through every combination of states and firing sequences until it finds them. This would work, eventually, but would also be super inefficient. It would have a Big O complexity of something like exponential time. In other words, doubling the input size results in computation time increasing by a factor of 2^n . For large systems, this growth quickly becomes infeasible.

Luckily there are a number of more modern approaches to this that you can use to save time. The most common is probably the Martínez–Silva algorithm. While these approaches are more efficient than brute force, they still struggle with scalability. So that brings us (finally) to today’s paper. These authors have produced another algorithm, and this one completes in polynomial time. Specifically cubic polynomial time. Here's how it works:

When the algorithm gets called you need to pass it an incidence matrix, which is just the mathematical version of the diagram I described earlier. It starts by just initializing an empty list. This is where it’s going to store any uncovered transitions that it finds. Next, it flips (transposes) the rows and columns of the incidence matrix. This is just to prepare it for the next steps, which involve analyzing the transitions mathematically. Now, the algorithm reworks the transposed matrix into something called Reduced Row Echelon Form (RREF). This is a cleaned-up version where patterns in the data become much clearer. It involves reorganizing and simplifying the rows to highlight key features. With the RREF in hand, it then looks at each row (representing places in the system) to find transitions that don’t balance properly. If a transition has no role in maintaining the flow of tokens through the system, it’s flagged as uncovered, and gets added to the list.

So, why is this better? Primarily because it has polynomial complexity (cubic). Other methods generally have either exponential or higher polynomial complexity. It’s also better because it focuses on transitions, not full invariants. Instead of exhaustively computing all t-invariants, it directly identifies transitions that fail the invariant conditions. This saves significant computational effort. Or at least that was their theory. But the authors needed to put this to the test.

To evaluate it, they conducted experiments on a benchmark set of 386 Petri net systems. The evaluation compared their algorithm's performance to the Martínez–Silva algorithm and the PIPE tool. They measured both efficiency (how quickly the algorithm ran) and effectiveness (how accurately it detected uncovered transitions).

The results showed that their algorithm successfully analyzed 384 out of 386 benchmarks, whereas the Martínez–Silva algorithm failed on 7 cases, and the PIPE tool failed on 47 cases (often because they couldn’t complete within the hour that the team allotted). In terms of speed, the new algorithm consistently outperformed the other methods (but the delta of the runtimes was obviously highly-dependent on the size of the input).

Overall their approach was a success. And if their paper ends up being replicable, it could have an outsized impact on this problem space. This work empowers engineers and system designers with a faster, more reliable tool for refining complex systems. If you yourself are going to be building, modeling, or optimizing a distributed system anytime in the near future, then it might be worth your while to download the paper. The authors showcase a number of diagrams and pseudocode snippets that really walk through their algorithm in far more detail than we covered here.