Infrastructure as Code is a key element of most top performing engineering setups. It’s a big leap forward in the way Ops and Devs interact with their own infrastructure. Interestingly, many still disagree on its definition and best practices. This article will clearly describe IaC, looking at both the its great benefits and crucial limitations.
Infrastructure as Code, or IaC for short, is a fundamental shift in software engineering and in the way Ops think about the provisioning and maintenance of infrastructure. Despite the fact that IaC has established itself as a de facto industry standard for the past few years, many still seem to disagree on its definition, best practices, and limitations.
This article will walk through the evolution of this approach to infrastructure workflows and the related technologies that were born out of it. We will explain where IaC came from and where it is likely going, looking at both its benefits and key limitations.
Remember the Iron age of IT, when you actually bought your own servers and machines? Me neither. Seems quite crazy right now that infrastructure growth was limited by the hardware purchasing cycle. And since it would take weeks for a new server to arrive, there was little pressure to rapidly install and configure an operating system on it. People would simply slot a disc into the server and follow a checklist. A few days later it was available for developers to use. Again, crazy.
With the simultaneous launch and widespread adoption of both AWS EC2 and Ruby on Rails 1.0 in 2006, many enterprise teams have found themselves dealing with scaling problems previously only experienced at massive multinational organizations. Cloud computing and the ability to effortlessly spin up new VM instances brought about a great deal of benefits for engineers and businesses, but it also meant they now had to babysit an ever-growing portfolio of servers.
The infrastructure footprint of the average engineering organization became much bigger, as a handful of large machines were replaced by many smaller instances. Suddenly, there were a lot more things Ops needed to provision and maintain and this infrastructure tended to be cyclic. We might scale up to handle a load during a peak day, and then scale down at night to save on cost, because it's not a fixed item. Unlike owning depreciating hardware, we're now paying resources by the hour. So it made sense to only use the infrastructure you needed to fully benefit from a cloud setup.
To leverage this flexibility, a new paradigm is required. Filing a thousand tickets every morning to spin up to our peak capacity and another thousand at night to spin back down, while manually managing all of this, clearly starts to become quite challenging. The question is then, how do we begin to operationalize this setup in a way that's reliable and robust, and not prone to human error?
Infrastructure as Code was born to answer these challenges in a codified way. IaC is the process of managing and provisioning data centers and servers through machine-readable definition files, rather than physical hardware configuration or human-configured tools. Now, instead of having to run a hundred different configuration files, IaC allows us to simply hit a script that every morning brings up a thousand machines and later in the evening automatically brings the infrastructure back down to whatever the appropriate evening size should be.
Ever since the launch of AWS Cloudformation in 2009, IaC has quickly become an essential DevOps practice, indispensable to a competitively paced software delivery lifecycle. It enables engineering teams to rapidly create and version infrastructure in the same way they version source code and to track these versions to avoid inconsistency among IT environments. Typically, teams implement it as follows:
And voilá, your infrastructure is suddenly working for you again instead of the other way around.
There are traditionally two approaches to IaC, declarative or imperative, and two possible methods, push and pull. The declarative approach is about describing the eventual target and it defines the desired state of your resources. This approach answers the question of what needs to be created, e.g. “I need two virtual machines”. The imperative approach answers the question of how the infrastructure needs to be changed to achieve a specific goal, usually by a sequence of different commands. Ansible playbooks are an excellent example of an imperative approach. The difference between the push and pull method is simply around how the servers are told how to be configured. In the pull method, the server will pull its configuration from the controlling server, while in the push method the controlling server pushes the configuration to the destination system.
The IaC tooling landscape has been in constant evolution over the past ten years and it would probably take up a whole other article to give a comprehensive overview of all the different options one has to implement this approach to her specific infrastructure. We have however compiled a quick timeline of the main tools, sorted by GA release date:
This is an extremely dynamic vertical of the DevOps industry, with new tools and competitors popping up every year and old incumbents constantly innovating; CloudFormation for instance got a nice new feature just last year, Cloudformation modules.
Thanks to such a strong competitive push to improve, IaC tools have time and again innovated to generate more value for the end-user. The largest benefits for teams using IaC can be clustered in a few key areas:
There are more specific advantages to particular setups, but these are in general where we see IaC having the biggest impact on engineering teams’ workflows. And it’s far from trivial, introducing IaC as an approach to manage your infrastructure can be a crucial competitive edge. What many miss when discussing IaC however, are some of the important limitations that IaC still brings with it. If you have already implemented IaC at your organization or are in the process of doing so, you’ll know it’s not all roses like most blog posts about it will have you believe. For an illustrative (and hilarious) example of the hardships of implementing an IaC solution like Terraform, I highly recommend checking out The terrors and joys of terraform by Regis Wilson.
In general, introducing IaC also implies four key limitations one should be aware of:
Once again, these are not the only drawbacks of rolling out IaC across your company but are some of the more acute pain points we witness when talking to engineering teams.
As mentioned, the IaC market is in a state of constant evolution and new solutions to these challenges are being experimented with already. As an example, Open Policy Agents (OPAs) at present provide a good answer to the lack of a defined RBAC model in Terraform and are default in Pulumi.
The biggest question though remains the need for everyone in the engineering organization to understand IaC (language, concepts, etc.) to fully operationalize the approach. In the words of our CTO Chris Stephenson “If you don’t understand how it works, IaC is the biggest black box of them all”. This creates a mostly unsolved divide between Ops, who are trying to optimize their setup as much as possible, and developers, who are often afraid of touching IaC scripts for fear of messing something up. This leads to all sorts of frustrations and waiting times.
There are two main routes that engineering team currently take to address this gap:
Neither of these approaches really solves for the gap between Ops and devs. Both are still shaky or inflexible. Looking ahead, Internal Developer Platforms (IDPs) can bridge this divide and provide an additional layer between developers and IaC scripts. By allowing Ops to set clear rules and golden paths for the rest of the engineering team, IDPs enable developers to conveniently self-serve infrastructure through a UI or CLI, which is provisioned under the hood by IaC scripts. Developers only need to worry about what resources (DB, DNS, storage) they need to deploy and run their applications, while the IDP takes care of calling IaC scripts through dedicated drivers to serve the desired infrastructure back to the engineers.
We believe IDPs are the next logical step in the evolution of Infrastructure as Code. Humanitec is a framework to build your own Internal Developer Platform. We are soon publishing a library of open-source drivers that every team can use to automate their IaC setup, stay tuned to find out more at https://github.com/Humanitec.