Ubiquity, Security and Open Source

It is the year 2017 Kelsey Hightower is on the KubeCon stage. The sound of the microphone starts echoing... Raise your hands if you think installing kubernetes is easy. This is how a well known Kubernetes advocate started his presentation.

Explaining abstract concepts as we all know, is complicated. How do you wrap your head around a concept such as a cluster?

Cluster is the core concept of Borg and later on Kubernetes. But what is it and why is it important?

Kelsey Hightower made his name describing concepts such as this into understandable metaphors.

In the same year, 2017, at the O’Reilly Software conference, he explained the reason why you would use something like Kubernetes. He used the game “Tetris”. Imagine your machine is a Tetris. Everything is automated but without awareness of CPU and memory. Now imagine the blocks all falling vertically without any changes. Very soon the game will be over and your machines will run over.

However, imagine you use kubernetes to schedule these “blocks“ of workloads, fitting them into the machine’s spare resources.

When blocks are moved to the best possible places, Tetris may be a never-ending game and so is your cluster capacity. The cluster mentality of knowing where to put a workload best is one of the greatest advantages when it comes to using Kubernetes. Kubernetes actually knows where to schedule the workload based on CPU and memory.

Kubernetes solved a lot of problems. It created a whole ecosystem and community around it. The rate of adoption and its ubiquity became a rallying point to many software engineers.

Docker helped make the containers mainstream however it introduced a problem. How do you use containers in production? Kubernetes solved that problem.

I'm your host Kassandra Russel and today, we will walk you through the adoption that Kubernetes went through. How it differentiated itself from other container orchestration systems.

We’re going to discuss the early growing pains Kubernetes have. Finally, we’ll talk about how kubernetes adoption best practices. Should you self host your own clusters or use a fully managed service? What is a good setup for a single cluster or multiple clusters?

Kubernetes gathers the best and brightest in the industry twice or thrice every year into a conference called KubeCon. Here is where we can measure the amount adoption Kubernetes had within the industry.

In 2016, the attendance of Kubecon was 1136 people, it quadrupled next year with 4000 people and doubled in 2018 with 8000 people. In 2019 it was at 10000 people.

According to CNCF or the cloud-native computing foundation. Kubernetes is the second biggest open-source project behind Linux.

The Cloud Native Computing Foundation is a Linux Foundation project that was founded in 2015. It gathers a group of companies and organizations sharing aspirations to build things upon Kubernetes and around its ecosystem. The founding members include Google, CoreOS, Mesosphere, Red Hat, Huawei, Intel, Cisco, IBM, Docker, Univa, VMware, and even the blue bird's social network, Twitter.

Usually, containers, kubernetes, and microservices are correlated with the Cloud Native way of building software.

The CNCF provides an official definition: Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

If you build a cloud-native application, each part of your app should be packaged in its own container, dynamically orchestrated so each part is actively scheduled and managed to optimize resource utilization.

These techniques enable resilience, manageability, and observability. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

In order words, Cloud-Native implies you use containers, container orchestration, and microservices. Not only do you adopt Kubernetes but also you adopt the Cloud Native mindset which comes with Kubernetes.
Some companies implemented similar approaches and showed that the cloud-native way of building applications was helpful on a large scale.

The popular movie-streaming service Netflix, for instance, runs more than 600 services in production and deploys hundreds of times per day.

The ride-hailing company is another example, Uber runs actually, more than 1,000 services in production, and deploys several thousand times each week.

These companies, along with using open source tools like Kubernetes and Docker, built their own tools. As a matter of fact, Netflix built open-source tools, runtimes, and libraries like Hystrix, Chaos Monkey, Governator, Prana, Eureka, and Zuul. But it seems that not only Netflix is building similar technologies; if you take a look at the open-source projects on Github, you’ll notice the unceasingly increasing and big number of tools built around the containerization and orchestration. The dizzying array of technologies and applications causes issues with adopting this new mindset.

CNCF indexed a portion of these technologies. 1,390 tools with more than 2,000,000 Github stars, a market cap of $17.2T, and funding of $65.84B.

The outcome is we are looking for a way out of homemade toasters where we build everything and create ready-made components to make the explosion of innovation within the Cloud Native Space.

Alexis Richardson, former Technical Oversight Committee Chair

If you are interested in discovering the new open-source tools as they are released, subscribe to Kaptain weekly newsletter on faun.dev and we will make sure to share the most interesting and new tools with you each week.

Kubernetes is not just a hot topic, we, at FAUN, think it is the future of modern infrastructure. It’s the Operating System of data centers, we believe that it has a bright future, as Linux had compared to other operating systems. This is why we dedicated a whole newsletter to this ecosystem.

Actually, a big reason for the adoption of Kubernetes is its community and its open-source approach to building cloud platforms.

From a purely technical view, another reason why you would want to adopt something like Kubernetes is the reproducibility and extensibility of such a platform.

In the same vein, you would want to create infrastructure as code to reproduce your infrastructure within a few steps, you can use declarative commands within Kubernetes. This type of command allows you to define what you want to achieve rather than how you want to achieve it.

The opposite of “declarative” in our context is “procedural”. Do you know what’s the difference? If yes, let us redefine it in a fun way.

According to cognitive psychologists like Tulving Endel, we can categorize knowledge into two types: Declarative knowledge and procedural knowledge. The first one involves knowing that something is the case - that A is the first letter of the alphabet, that Paris is the capital of France.

Declarative knowledge is conscious; it can often be verbalized.

Procedural knowledge involves knowing how to do something like how to ride a car. We may not be able to explain how we do it, the best way to explain it is by doing it.

In the same way applied to infrastructure, with declarative definitions, you don't need to think about how to do something; you only write what should be there. In the procedural approach, you need to describe all the sequences of commands to achieve a goal.

You may probably assume that “declarative” is a better approach, because it seems to be easier, and especially because it’s a trending topic today.

Let’s have a grey thinking here: The reality is not black and white:

The declarative approach is more sophisticated, for sure. The Terraform language, as an example, is declarative, and it’s widely adopted and those who use it have excellent outcomes. If you want to create a repeatable reproducible infrastructure, in most cases, the declarative approach is an outstanding approach.

Now, take the example of building a CI/CD pipeline using tools such as Jenkins where you need to describe step by step, how your integration, tests, and build should be done. In reality, the procedural approach fits better with the last example, the declarative approach may not be the most suitable.

Let us return to the matter at hand.

We were talking about the declarative way of creating and updating Kubernetes resources. This approach adds the ability to easily reproduce your resources wherever you are running your cluster. It separates the process of solving the problem from stating it.

This property also allows for extensibility; you can state the problem and create different ways of solving it. This paved the way for a new concept in the image of Kubernetes Operators. Which we will discuss in one of our coming episodes

Kubernetes took this declarative property of abstracting away the implementation details from the end goal to the extreme level. Its architecture became not tied to a specific cloud provider or a specific pattern. Rather it allowed Kubernetes to be fully customizable. Whether you are running the data center in a public, private or hybrid cloud.

This made myriads of people with a special interest in the Kubernetes space. Since Kubernetes is extensible and customizable it allowed many people to innovate and interact with their clusters.

A consequence of having multiple special interests is that the governance of Kubernetes became, somehow, decentralized. The general management of the project still lies in the steering committees however new working groups and special interest groups joined the field.

This is an interesting point, actually, people working on Kubernetes in different parts from the very specialized domain-specific SIGs, to the horizontal SIGs produce valuable contributions.

Kubernetes special interest groups are working on projects like;

“cluster-addons”, addon operators for Kubernetes clusters,
“Kustomize” for customization of kubernetes YAML configurations
And “External-dns” that help developers configure external DNS servers for Kubernetes Ingresses and Services including AWS Route53, Google CloudDNS, and others.

There are around 100 SIG projects and 200 main contributors.

If you have ever watched the show “IT Crowd”. There is a segment in almost every episode in which Roy asks: “Have you tried turning it off and on again?”.

This puts a smile on your face! But there is in fact a reason for this.

Failure is inevitable. How you recover is what matters. Kubernetes won the orchestration wars because of its focus on self-healing property.

In essence, it turns off and on again failing resources within your infrastructure. It tracks the state of each Kubernetes resource and “reconciles” them to the desired state. This is done via Control Loops or what we refer to as “Controllers” in Kubernetes world.

There are multiple controllers built into Kubernetes. Nevertheless, you can customize them or build your own.

Each Kubernetes cluster has a data model that is stored in the database backend. This data model is the key to describing multiple resources. Finally having a data model allows you to create extensions by combining the declarative method and the control loops which live in each of the controllers. These multiple controllers are distributed in multiple machines within a Kubernetes cluster.

Having multiple machines working together presents another problem, the “Distributed systems problems”. Kubernetes is not immune to this, however, being built on solid foundations such as ETCD allows you to think about not solving these problems again.

ETCD is the database of Kubernetes which handles different problems within the distributed nature of the system. Like for example the problem of consensus. The consensus in computer science is the coordination of multiple computers.

Imagine there are two generals trying to invade a castle, both of them need to attack at the same time or the battle is sure to be lost. One general sends a message to the other general using a pigeon. The other general does the same. But for some reason, one or both messages did not reach the intended recipient. How do you build consensus on whether to attack or not?

ETCD uses the raft algorithm to circumvent this consensus problem within distributed computing. Plus, it gave Kubernetes a lot of primitives to build on top of distributed computing.

Finally, Kubernetes and containers, in general, encourage developers to build systems that are immutable in nature. There is a lot of value for building immutability into your infrastructure.

First off you no longer need to patch containers. Instead, if a vulnerability is found or a patch is required, you simply deploy another version of the container which greatly reduces operational overhead in maintenance.

Second, it allowed for greater scalability. If applications are immutable in nature then having multiple replicas or copies of the same thing is easy. You can easily deploy applications on top of each other and gradually phase out the old version of the application. In the Kubernetes world, this is referred to as “rolling update” and is the default strategy when deploying new containers.

Third, having the immutable infrastructure and immutable applications allows us to recreate them in a consistent state. Each time you deploy the same version of the container the state of the application remains the same. This means you no longer have to worry whether the application is the same depending on what existed in the machine when you did your deployment. In other words, it gave you peace of mind and more sleep even while on-call.

Talking about being on-call, one of our future episodes will treat this topic in-depth. If you want to suggest a topic, you can find us on Twitter using joinFAUN username.

Back to kubernetes, thinking about it and where does it fit in this new Cloud Native Architecture.

Well, Google had this idea that “we must treat the data center itself as a massive warehouse-scale computer”.

According to Kelsey Hightower, the idea behind this is that we want to abstract away the individual machines and just treat them as one logical thing. So in order to treat the data center as a computer what do you need?

You need an Operating System. What if you remove the ability to log in to machines? How would you think differently? Kubernetes becomes your framework for building distributed systems. It can act as the Kernel of your Operating System and you can build your components on top of it to make it a fully-fledged data center operating system.

In essence, Kubernetes will then become the base layer of your new Cloud Native Architecture.

Don't forget to follow joinFAUN on Twitter. You can also join our online community by visiting faun.dev/join.

If you want to reach us, you can also use our email community@faun.dev.

If you love the DevOps Fauncast, we'd love for you to subscribe, rate, and give a review on iTunes.

Until next time!