Part of the reason why securing Kubernetes can be challenging is that Kubernetes isn’t a single, simple framework. It’s a complex, multi-layered beast.
Each layer—from code and containers to clusters and external cloud services—poses its own set of security challenges and requires unique solutions. Any issue at one layer, such as a vulnerability in a container, gets amplified if another layer has a security issue, such as an unencrypted database.
The good news in the face of this complexity is that it’s possible to use a single methodology—infrastructure as code (IaC)—to secure all layers of Kubernetes. Furthermore, when you define the various layers of your Kubernetes environment using an IaC approach and include automated IaC security in your workflow, you maximize your ability to prevent misconfigurations that could create Kubernetes security vulnerabilities in your clusters.
To explain how to leverage IaC as the foundation for Kubernetes infrastructure security, this article walks through nine Kubernetes security best practices that apply to the various layers or components of Kubernetes. It also shows how IaC or related security automation techniques can help mitigate each challenge.
From a Kubernetes security standpoint, let’s start our discussion at (what some might consider) the most basic layer of a Kubernetes environment: the host infrastructure. Here, we’re talking about the bare metal and/or virtual servers that serve as Kubernetes nodes.
Securing this infrastructure starts with ensuring that each node (whether it’s a worker or a master) is hardened against security risks. An easy way to do this is to provision each node using IaC templates that enforce security best practices at the configuration and operating system level.
When you write your IaC templates, ensure that the image template and/or the startup script for your nodes are configured to run only the software that is strictly necessary to serve as nodes. Extraneous libraries, packages, and services should be excluded. You may also want to provision nodes with a kernel-level security hardening framework like SELinux or AppArmor, as well as basic hygiene like encrypting any attached storage.
Depending on where and how you run Kubernetes, you may use an external cloud service to manage access controls for your Kubernetes environment. For example, if you use AWS EKS, you’ll use AWS IAM to grant varying levels of access to Kubernetes clusters based on individual users’ needs. (Note that this is different from internal access controls within Kubernetes clusters, which are defined using Kubernetes RBAC.)
Using a least privilege approach to manage IAM roles and policies in IaC minimizes the risk of manual configuration errors that could grant overly permissive access to the wrong user or service. You can also scan your configurations with IaC scanning tools (such as our open-source tool, Checkov) to automatically catch overly permissive or unused IAM roles and policies.
Container registries, which are platforms that store container images, aren’t a native part of Kubernetes. But they’re widely used as part of a Kubernetes-based application deployment pipeline to host the images that are deployed into a Kubernetes environment.
Access control frameworks vary between container registries. Some registries (like AWS ECR) can manage access via public cloud IAM frameworks. Either way, you can typically define access permissions using code, then apply and manage them using IaC.
In doing so, you’ll want to follow the principle of least privilege: container images should only be accessible by registry users who need to access them. Just as important, unauthorized users should also be prevented from uploading images to a registry. Insecure registries are a great way for threat actors to push malicious images into your Kubernetes environment.
This is a low-hanging yet easy-to-miss one. By default, every Kubernetes cluster contains a namespace named “default.” As the name implies, the default namespace is where workloads will reside by default unless you create other namespaces.
Although using the default namespace isn’t the worst mistake you can make, it presents two security concerns.
One is that everyone will know the name of your namespace, a very important configuration value, making it that much easier for attackers to exploit your environment. Again, that doesn’t mean you’ll instantly be hacked just because you use the default namespace, but the less information you make available to the bad guys, the better.
The other, more significant concern with default namespaces is that if everything runs there, then your workloads are not segmented. It’s a better idea to create separate namespaces for separate workloads. Doing so makes it harder for a breach against one workload to escalate into a cluster-wide issue.
As a bonus, Google Cloud says that using the default namespace could be bad from an administrative perspective, too.
Alternatively, you can create new namespaces using kubectl or define them in a YAML file. You can also scan existing YAML files to detect instances where workloads are configured to run in the default namespace.
At the container level, a key security consideration is making sure that containers can’t run in privileged mode.
This is easy to do using IaC. Simply write a security context that denies privilege escalation, and make sure to include the context when defining a pod:
Then, make sure privilege isn’t granted directly with the “privilege” flag or by granting CAP_SYS_ADMIN. Here again, you can use IaC scanning tools to check for the absence of this security context and to catch any other privilege escalation settings within pod settings.
By default, any pod running in Kubernetes can talk to any other pod over the network. That’s why, unless your pods actually need to talk to each other (which is usually only the case if they are part of a related workload), you should isolate them to achieve a higher level of segmentation between workloads.
As long as you have a Kubernetes networking layer that supports Network Policies (most do, but some Kubernetes distributions default to CNIs that lack this support), you can write a Network Policy to isolate pods at the network level or to place specific restrictions on network connections between pods. For example:
A policy like this specifies which other pods the selected pod (here, pods with the “role=db” label) can connect to for both ingress and egress.
Here again, the beauty of defining all of this in code is that we can scan the code to check for configuration mistakes or oversights that may grant more network access than we intended.
When you tell Kubernetes to pull an image from a container registry, it will automatically pull the version of the specified image labeled with the “latest” tag in the registry.
While this may seem logical—after all, you typically want the latest version of an application—it can be risky from a security perspective because relying on the “latest” tag makes it more difficult to track the specific version of a container that you are using. In turn, you may not know whether your containers are subject to security vulnerabilities or image poisoning attacks that impact specific images on an application.
To avoid this mistake, specify image versions, or better yet, the image manifest when pulling images, and audit your Kubernetes configurations to detect instances that lack specific version selection for images. It’s a little more work, but it’s worth it from a Kubernetes security perspective.
Regardless of which container image version you use, you should never assume it’s secure. The same is true of Helm charts. Millions of public container images and Helm charts turn out to contain security misconfigurations and vulnerabilities, sometimes very severe ones.
Before using open-source Helm charts from Artifact Hub or elsewhere, you should scan them for misconfigurations. Similarly, before deploying container images, you should scan them using tools to identify vulnerable components within them. This isn’t exactly IaC, but it’s similar because it allows you to automate the identification of security risks within Kubernetes before putting them into production.
If a security incident (or, for that matter, a performance incident) does occur, Kubernetes audit logs tend to be very helpful in researching it. This is because audit logs record every request to the Kubernetes API server and its outcome.
Unfortunately, audit logs aren’t enabled by default in most Kubernetes distributions. To turn this feature on, add lines like these to your kube-apiserver policy file:
This tells Kubernetes where to store audit logs and where to find the policy file that configures what to audit.
Because audit logs offer critical security information, it’s worth including a rule in your Kubernetes configuration scans to check whether audit logs are enabled.
As you can tell, Kubernetes is not very secure-by-default, and, unfortunately, there is no “one dumb trick” that can solve all your Kubernetes security woes. Instead, Kubernetes security requires a multi-pronged approach that addresses the security risks that exist across the various layers of Kubernetes.
Fortunately, by leveraging an IaC-based approach to defining security rules in Kubernetes, you can more easily configurations and minimize the risk of configuration issues that will lead to security breaches within any layer of your Kubernetes stack.