Advanced Terraform Security: Pro Tips for Secure Infrastructure as Code

In simpler times, IT operations had the responsibility to configure and deploy new infrastructure, mostly via command line interface (CLI) and scripts. Today, even those with the most advanced command line prowess cannot keep up with the scalability and agility demands of companies building and deploying infrastructure in the cloud.

Infrastructure as code (IaC) is designed to provision cloud resources entirely through code and automation. IaC frameworks such as Terraform and CloudFormation make infrastructure management more repeatable, dependable, and scalable.

As we covered in our recent blog post, Terraform 101: Best practices for secure infrastructure as code, Terraform is one of the most popular frameworks that enables all the benefits of IaC, including security. In this post, we’re continuing our Terraform security journey to look at some advanced tips for using Terraform to reduce security and compliance errors, streamline deployments, and harden security.

Scan Terraform HCL as early as possible

Similar to how SAST and DAST tools inspect the different components of your application for known vulnerabilities in your software or in its dependencies, you can scan for misconfigurations in code. To check for issues, or misconfigurations, you simply scan Terraform code against policies that determine whether the infrastructure is considered secure and compliant. These policies help maintain security, comply with regulations, and enforce operational best practices. They can range from simple things, like ensuring an S3 bucket is encrypted, to more advanced checks, like enforcing multi-factor authentication for all users. We’ll explore how to write your own policies later on.

Bridgecrew’s Terraform policies range from simple public exposure of resources to more complex, graph-based policies. Here are some of our more popular policies:

Equally as important as the policies involved in scanning are the points in time that you run scans. Any scanning and policy enforcement is better than none, but to take your Terraform security to the next level, you should address security as early in the DevOps lifecycle as possible—and make it as customized to your specific environment and needs.

Scan in pull/merge requests, CI/CD, and runtime

One place you can benefit from automated scans is in the pull/merge request stage. Every time you push a change to an infrastructure-as-code file, a tool like Bridgerew will automatically scan it and report failures as a GitHub comment. The comment will contain full details about the failure, such as policy name, severity, and a link to additional remediation options in the platform.

For example, let’s say you were to add an S3 bucket to a Terraform template. Bridgecrew can spot a misconfiguration and its severity, in this case, whether or not the S3 bucket has logging enabled. It will then provide comments where your code is to help you make the appropriate fixes.

In CI/CD, you can actually block builds that don’t match your stated policies. For example, if your misconfiguration shows up as critical severity, you can automatically prevent it from being deployed or added to your repository.

Scanning in early development stages is especially important because it’s easier to make changes due to the complexities that are introduced in subsequent development stages. A developer given feedback in early stages will still have the code fresh in their minds, and so they are more likely to understand where and how to make changes.

One place to start is to implement security scanning in your integrated development environment (IDE), such as VS Code or IntelliJ. By utilizing our VS Code extension, you can place guardrails directly where developers are coding, improving the quality of code while saving time and resources spent identifying, detecting, and resolving issues further downstream. For example, if you were to scan an EC2, you might find flagged misconfigurations around data stored in an S3 bucket that must be securely encrypted at rest. The scan here would tell you where in the code to add encryption.

Another easy way to get fast, local feedback is running scans straight from your command line. Regardless of where you’re getting early feedback, being able to address issues before your Terraform code is integrated into a shared repository is crucial. That said, because of Terraform’s modular and dependency-driven nature, you can’t always find all relevant misconfigurations with just the code.

Your IDE is one place to provide individual feedback, but it’s also important to scan Terraform HCL consistently within shared repositories. Whether that’s in your VCS or build pipeline is up to you and your tools. A few IaC security tools including Bridgecrew have more advanced native integrations with VCS’ that allow you to configure how scan results appear—either as checks or comments—and even fix issues directly in pull requests. Bridgecrew also allows you to configure checks to block merging or just to be informational. For less severe misconfigurations, you can configure code comments to provide insight in the context of the code to implement the necessary fixes.

Pair GitOps with continuous integration and continuous deployment (CI/CD)

To up-level your Terraform security, adopting GitOps is key. This might seem like an obvious first step but it’s important to note that GitOps encompasses more than simply incorporating a version control system (VCS).

Simply put, GitOps is an operating model that applies version control, collaboration tools, and CI/CD used in application development to infrastructure automation. In the GitOps model, the open-source version control system Git is the single source of truth, making all changes visible and verifiable. GitOps and infrastructure as code go hand-in-hand. Although originally coined to describe Kubernetes orchestration, leveraging Terraform is, by default, leveraging GitOps.

With GitOps, we manage Terraform infrastructure using the same tools as for application software development. We store the Terraform code in a version control system like GitHub, so we can modify and review it collaboratively. When an update is pushed into the repository, a suite of tests and validations are triggered. If the tests complete successfully, Terraform deploys the changes in the cloud for us.

Adopting GitOps is a critical step in realizing the benefits of IaC, but doing so from the onset is what separates advanced teams from those that might try to retrofit a deployment pipeline into an already running system.

A CI/CD workflow (or your pull/merge request checks) can test, build, and simultaneously deploy the application and its supporting infrastructure. Moreover, CI/CD allows us to introduce checks into the whole process, ensuring the new environment adheres to security best practices and complies with necessary or relevant compliance benchmarks. Scanning later (with the right tools) also allows us to get feedback on compiled Terraform code—aka Terraform plan output. The Terraform plan stage often contains dynamic values that do not exist in Terraform HCL and thus provides a more complete representation of what configuration changes are going to be applied.

Identify misconfigurations in runtime and remediate cloud drift

Moving away from buildtime, CSPM tools can identify misconfigurations in runtime. For example, if you have an open port 22 in a security group, a tool will identify this in runtime. One thing to note is that with GitOps, changes to misconfigured code should always be made directly to an IaC template and not directly to the code itself. This way, you maintain consistency and can reap the scalability benefits of IaC.

Changes made directly to code will inevitably cause cloud drift. Cloud drift occurs when changes made to a configuration in your cloud environment get out of sync with the IaC that previously defined it. This can occur for a variety of reasons. Sometimes, it is the result of a temporary change that winds up becoming permanent. For example, a team member might attempt to troubleshoot a problem within an application during a “break glass” moment. In this case, a change directly to cloud infrastructure might resolve the problem, but if the change isn’t reverted it could cause problems later on down the line.

In other cases, cloud drift is the result of a lack of IaC security knowledge. A security ops team member may not have enough experience with IaC frameworks, and instead may head straight to a cloud console or CLI to fix a misconfiguration. This will inevitably remove the auditability, collaboration, and repeatability benefits of IaC.

Drift detection can help you identify manual modifications made directly in the cloud provider platform. For example, you can use Bridgecrew to compare Terraform code against running cloud resource configurations across AWS, Azure, and GCP. To detect cloud drift, Bridgecrew leverages the open-source tool Yor, which tags every IaC resource in code with several valuable infrastructure tags. Once deployed, the running cloud resources also have that tag, enabling you to quickly trace the running cloud resource state to the Terraform code stored in your VCS for comparison. If the change made should be permanent, you can easily add the drifted configuration into the code resource block in your VCS by selecting Fix Drift. If the drift is a temporary change that needs to be reverted, running terraform apply will bring the resources back inline.

Utilize custom policies

When it comes to developing consistent and secure infrastructure, there is no one-size-fits-all. Every product is unique and so are the goals for individual teams. Additionally, sometimes you may want to enforce policies that are different than those that cover broader industry benchmarks such as HIPAA or GDPR.

With Bridgecrew Custom Policies, you can create everything from the most basic checks to the most complex policies to fit your needs. Custom policies allow monitoring and enforcing of cloud infrastructure configuration in accordance with your organization’s specific needs—security or otherwise.

Custom policies make it easy for anyone to write policies that get enforced wherever infrastructure is governed. All policy-as-code engines allow for creating your own custom policies including Bridgecrew and Checkov, which allow for YAML custom policies or creation with our visual editor so you don’t need to know a domain-specific language. By dedicating time to defining your organization’s specific needs upfront through custom policies, less time will be spent fixing misconfigurations after the fact.

For example, let’s say you want to make sure all compute instances created aren’t smaller than t3.small. A custom policy for that might look something like:

Manage access via IAM-as-code

Identity and access management (IAM) controls who is authorized to access each cloud resource. Administrators not following DevSecOps practices have to manage permissions and roles manually. This approach, however, is fickle and prone to give more permissions than strictly needed. It’s far easier to add a *

Defining IAM roles and permissions in Terraform is a great way to manage IAM sprawl and with the right policies in place, is a great way to enforce security best practices such as the principle of least privilege.

Another advanced use case for implementing security best practices with Terraform is by using code to define identities and privileges for everyone and every service in the environment.

Removing any unneeded permissions.
Managing who can access each service and application.
Ensuring all permissions are granted and removed through automated processes

It’s important to only provide direct infrastructure access to the right users. Permissions should be managed only by automation, and forbidding direct administrative access is the best way of guaranteeing that changes are applied cleanly.

Follow these guidelines to lock down direct access:

Complete all infrastructure and permission modifications through channels like GitOps, CI/CD pipelines, and Terraform. This way, access control is baked into the deployment plan.
Adopt the principle of least privilege. This grants the minimum permissions needed to deploy and run your applications.
Lock down every account, even the accounts Terraform uses to deploy the changes. You see permissions Terraform needs by running it in trace mode: TF_LOG=trace terraform apply.

If you aren’t using Terraform to manage your IAM yet, a tool like AirIAM can give you a head start. AirIAM is an open source tool for transforming sprawling AWS IAM configuration into right-sized Terraform code. It produces a plan to migrate existing permissions into a Terraform plan. Bridgecrew created AirIAM to promote immutable and version-controlled IAM management, replacing today’s manual and error-prone methods.

For an advanced way of managing access, you can do so using attributes or tags. Attribute-based access control (ABAC) uses attributes or characteristics, such as environment, team, data classification, and others to determine access. ABAC is typically a better way to provide access than by role, since team members often work in a variety of teams and can require more or less access as a project evolves.

It can be a challenge to maintain, especially when leveraging IaC. In order to make attribute-based access control (ABAC) work, tags need to be added consistently and enforced in both build-time and runtime.

Next Steps

In this post, we looked at several pro tips for reducing security risks within Terraform. Using Terraform is a step in the right direction, but it alone is not enough to guarantee your systems are secure. Bridgecrew has a number of tools that help improve Terraform security, including TerraGoat and AirIAM.

We invite you to continue learning about Terraform and IaC security. Try out our Terraform workshop and view the links below: