Today's article comes from the International Journal of Electronics and Communications. The authors are Niswar et al., from Hasanuddin University, in Indonesia.
DOI: 10.24425/ijet.2024.149562
Now, if you’re like me: alarm bells are already ringing. These shootouts happen all the time, and in many cases the experimental design is seriously flawed. In my opinion, this paper is no exception. So rather than take this article at face value (and report their findings as fact), we’re going to take a more critical lens. I am going to present their research and their findings, but I’ll spend much more time than usual on history and context, and then I'll use that to inform counter-arguments to their analysis. This study is imperfect, but it’s not without value. They did unearth some interesting findings, I just don’t think those findings should be taken at face value; so we won’t. The way we'll get the most out of this paper is through a critical, interrogative lens, so let's do that.
Before we get to the paper itself, let’s review some history:
SOAP: Before any of the options above, there was SOAP. SOAP is an XML based protocol that’s heavy, complicated, difficult to document, and bad at a number of things. But, it did set the playing field for what was to come: a generic and serializable format for data interchange over HTTP. With SOAP, two systems built with completely different stacks could exchange data without knowing anything about the other system, other than the WSDL (specification document) for the endpoint that was exposed. In the early days of the web, this was a fairly novel idea. For a few years it became the de-facto standard, replacing XML-RPC (its predecessor) for many of the major public APIs of the time.
Then, in 2000 Roy Fielding defined the concept of REST in his doctoral dissertation. REST, which stands for Representational State Transfer, had three design advantages over SOAP:
For over a decade, REST was the undisputed king of web API protocols. If you were building a new web API, especially a public one from the mid-2000s to the mid 2010s there’s a huge chance you were building it as REST. Twitter’s extremely popular API was REST. As was Amazon’s, Flickr’s and Google Maps. And importantly: so was Facebook’s.
Unlike the other sites, REST wasn’t working that well for Facebook as they scaled. Facebook was, famously, originally built as something like a LAMP stack. But as they grew they ended up with a novel set of issues:
As the objects in the system grew and the relationships between those objects grew, fetching information out of the system through a RESTful facade became slower and slower, and sending information to the system required chains of API requests and multistep database transactions. In response to these challenges, Facebook helped push two concepts into the mainstream:
Meanwhile, down the road at Google, they were facing a different problem. Google’s competitive edge had always been speed. But as they scaled and their systems became more distributed, that speed was harder and harder to maintain. This was partly in due to the overhead of the interchange happening between servers. For the options available to them at the time (XML-RPC, SOAP, etc) there was a delay associated with the serialization and deserialization of data. When a single client is talking to a single server, this delay might not be noticeable, but when a server needs to process a request by talking to many other servers and those other servers need to talk to even more servers, all that communication being serialized and de-serialized has a significant cumulative cost.
In response to this challenge, in 2001 they developed something called Protocol Buffers “protobuf”. They wouldn’t release it publicly until 2008. Protobuf is a binary format, and is designed from the ground up to be interoperable, tiny and fast. But on it’s own, it’s just a format, not an API protocol or a system. Google and the other companies adopting protobuf were largely using it as part of RPC (remote procedure call) clusters. At Google, the full system was called Stubby, and they kept it under wraps. That was until 2016 when they released the replacement for Stubby: gRPC. From the ground up, gRPC was built for efficiency. Unlike REST, which traditionally runs on HTTP/1 and usually uses JSON, gRPC runs on top of HTTP/2 and uses protocol buffers. In a nutshell: this makes gRPC capable of a level of bidirectional realtime communication at scale that would be extremely difficult to attain any other way. But, this comes at a cost. Namely: complexity. gRPC is extremely powerful, and flexible, but also has a significant learning curve. Because of this, today, it’s still largely used for high-performance server-to-server communication within an organization. We haven’t yet seen it used extensively as protocol for public APIs for 3rd parties. That is still the domain of REST and GraphQL.
Okay, sorry, that was actually way more history than I thought I was going to give you. Now that you have all that backstory under your belt, we’re able to turn to this study and examine it using all that context as a tool in our analysis.
For this study the researchers’ plan was simple: They would create the same API three times:
Then they would load test it, examine the response times of each, examine the resource utilization of the servers that were running each, and then pick a winner.
First they pulled down a dataset called SISTER, provided by Hasanuddin University. That dataset contains data about lecturers and those lecturers' educational backgrounds. You can think of it almost like the type of data that would be stored to power a site like LinkedIn. They loaded the data into MySQL, and then put a Redis cache in from of the DB. Essentially the Redis instance functioned as a read-replica for the database. Virtually all the data was pulled into Redis, which in essence moved it into RAM, allowing for quick and efficient reads.
Next, for each cluster they built three different microservices in Golang:
Each endpoint was structured as a standalone application and lived in its own container. Service #2 and #3 would both sit behind service #1, so in order to fetch either flat or nested data you’d have to pass through the authentication service, which would act as the gateway to the rest of the system. They built one cluster of containers for gRPC, one for REST, and one for GraphQL.
Then it was time to benchmark the performance of each cluster. For this they used Apache jMeter, and ran a series of load-tests simulating two different types of invocation patterns:
Let’s go over what the tests revealed:
Concurrent requests: At the highest level of concurrency tested (500)
Consecutive requests: At the highest level of throughput tested (500)
So, in short, in all head-to-head comparisons (other than the last nested test), the results were the same: gRPC was fastest, REST was next, and GraphQL was far behind. In their evaluation of the study, the authors conclude:
This superiority can be attributed to gRPC’s adoption of the HTTP/2 protocol, a departure from REST and GraphQL, which rely on the HTTP/1 protocol. The efficient handling of data exchange provided by the HTTP/2 protocol is a significant factor contributing to gRPC’s enhanced performance in comparison to its counterparts.
I’d argue that the above paragraph does a reasonably good job of explaining the difference between the results of REST and gRPC. But for this experiment, both GraphQL and REST were running over HTTP/1, so what explains the difference in performance between those two options? In other words, why did REST nearly keep pace with gRPC, while GraphQL fell far behind? Well, I think this research is fundamentally missing the point of GraphQL. Remember: GraphQL emerged as a solution that allows you to send a smaller total number of requests to fetch the same information. So benchmarking it on the number of requests it can handle per second is misguided. This is not comparing apples to apples. A true comparison would be if the authors setup a series of interrelated objects that were all stored behind their own microservice, and then setup a shootout to compare how long it takes each protocol to fetch the same amount of interrelated data out of the system. If GraphQL is capable of pulling it all out with one fetch, while REST and gRPC have to perform dozens or even hundreds of consecutive fetches, then the conclusion we’d draw from the shootout might be much different. This difference would likely grow even more if the GraphQL system was sitting in front of a graph database instead of Redis.
Another issue is that of the application design. gRPC can keep connections open (since it’s running over HTTP/2 which is full-duplex). This means that after the initial connection is open to the server it never needs to be opened again. This reduces some of the overhead that would be felt by the other options for every request. In idiomatic REST, one way to counter this limitation is to:
With a design-change such as the above the REST system, for example, could eliminate roughly 50% of the total network requests, effectively raising the capacity of the system.
Additionally, only profiling CPU utilization (and not RAM, network I/O, disk I/O or anything else) doesn’t give us a full picture of the resource-hungriness of these different systems. The authors' analysis presupposes that higher CPU utilization is universally bad, but it's much more complicated than that: What if, for example, one process takes 3x the CPU but a fraction of the Network I/O and half the RAM? What then?
Lastly, (and I’ll stop here just for brevity’s sake, not because I’ve run out of red-flags in this paper): this whole paper was about benchmarking read-requests at high concurrency. In reality, this isn’t particularly meaningful, as any modern application operating at this level of concurrency would have caches sitting in front of their reads. A much more interesting benchmark would be inserts, updates, and deletes.
At the end of the day, I think there are some interesting takeaways from this paper. Largely that the GraphQL application does seem CPU hungry under heavy reads. But, as we explored in earlier, reads-per-second and CPU utilization was never where GraphQL was meant to shine in the first place. Additionally, this research shows that gRPC is extremely fast, which is great, but what it doesn’t show is how the 3rd-party-developer experience is affected when you force them to consume a gRPC API instead of a RESTful or GraphQL one. These things matter. At the end of the day we’re building these applications and exposing these APIs so that other software engineers can build things on top of them. We should be kind to them. Speed matters, but so does usability.
Try as the authors may to paint gRPC, REST and GraphQL as interchangeable sister technologies shooting it out in a winner-take-all competition, that’s just not reality. These tools are each good at different things, for different use-cases, different consumers, and different invocation patterns. If you’re a Software Engineer I’d encourage you to keep all three in your bag of tricks, you’ll undoubtedly find a time and place where each one of them shines.