Today's article comes from MDPI's Journal of Mathematics. The authors are Mondal et al., from the Macau University of Science and Technology. In this paper, the authors answer a simple question: What can you do when you outgrow Kubernetes? They point out 6 specific areas where k8s struggles to scale, and develop a suite of solutions to breakthrough those bottlenecks. Let's dive in.
DOI: 10.3390/math12162476
Way back in 2014 Google released an open-source version of Borg, their in-house cluster management system. Borg was immense and highly specific to Google’s systems and servers, but this new open-source version would be different: It would be smaller, portable to different types of systems and hardware, focused narrowly on container orchestration, and generally useful for a number of workloads. They called the new project: Kubernetes.
Kubernetes came out at exactly the right time. Adoption of Docker had spiked in the previous years, microservices were all the rage, and companies of all sizes were grappling with a non-trivial task: container orchestration. With no robust turnkey solutions available, developers were either building one-off systems from scratch, or struggling to port their existing CI/CD systems to work with containers. Google was uniquely positioned to step in. They had been early adopters of containers, and had spent a decade learning how to run them at scale. Kubernetes (or k8s as it came to be known) was the distillation of everything they’d learned...just scaled down.
It’s now ten years later. Kubernetes is as popular as ever, and is still gaining market-share. The enterprise adoption of Kubernetes (and the workloads those enterprises are running), has meant that the framework is being pushed beyond its original scope. Remember, k8s was supposed to be a scaled-down version of Borg, not a fully-featured replacement. And now k8s is being pushed to levels of scale and complexity that it, arguably, was never intended to facilitate.
As k8s users have scaled their deployments, they’ve hit serious growing pains, and have had to figure out workarounds for the bottlenecks. The authors of this study are some such users. This paper details the playbook they've developed to scale k8s clusters beyond its original limits.
Before we jump into the list of modifications they made, let’s review the high-level architecture of Kubernetes as a whole. K8s has a distributed Primary-Secondary architecture (what we used to refer to as "Master-Slave"). The brain of the system is the Primary node. It’s responsible for scheduling, coordinating and assigning tasks to the secondary nodes, as well as storing configuration data. The Secondary nodes receive and process task instructions from the Primary node, and are responsible for governing the full lifecycle of the containers they carry. There are also a number of standalone CLI tools and SDKs/APIs that let a user interact-with and configure any of the servers in their system. The CLI abstracts away the Primary-Secondary convention, opting instead to present Pods, Labels, Deployments, Services and other high-level concepts to the user. As a whole, the k8s system is design to optimize for three goals:
So what’s wrong with this architecture? Nothing, really. It’s been suitable for hundreds of thousands of systems, big and small. Right out-of-the-box it offers more power and configuration options than most users will ever need. The authors of this article however are part of the tiny slice of users who have managed to outgrow it. Here are the issues they faced. As their systems grew in scale and complexity, they began to experience six distinct problems:
The authors proposed a multi-part solution that addresses each of the above issues.
Once their suite of solutions was operational, the authors deployed their cluster and performed benchmarking and analysis on it using the load testing libraries Fortio and Gatling. The results were promising. When the number of concurrent requests scaled beyond 2,000, their system was able to shave 7.6% off of the average request time, and reduce the minimum number of request-failures by 32%. While CPU and memory usage wasn’t reduced by a significant margin it also didn’t increase, which is definitely a win.
So what can we take away from this research? Narrowly, if you’re scaling a Kubernetes cluster and are pushing it to the limits of its performance, the tips and tweaks in this paper should help you break through some of those bottlenecks.
But more generally, I think the authors’ methodical approach to scaling is broadly applicable. That is: as you push systems to their limit, your instinct may be to tear everything down and start over; to re-architect a completely different system using a different design-pattern and different architecture. While that approach may be satisfying, it’s wrought with risks, and rarely solves the problem without creating far more. A significantly more conservative approach (slowly and deliberately modifying key pieces of the infrastructure one little bit at a time) may be a better option. The key is to focus on one bottleneck at a time, solve it, then move on to the next one.
If you’re interested in reading more of the details of exactly how they implemented their solutions, I’d encourage you to download the paper.