Jens segers on Jul 22 2013

Infrastructure as Code

For my master's degree thesis I did research about infrastructure automation with configuration management software. I discovered the power and advantages of these tools and would like to share my findings about Infrastructure as Code with my readers. During my research I had the chance to meet Patrick Debois, who seems to be master Yoda of the DevOps community, while I am still a young Padawan. His talks definitely influenced my thesis and probably even this post.

The availability of "infinite" compute power with virtual server technologies combined with a new generation of web frameworks lead to a new world of scaling problems, boosted by the launch of Amazon Elastic Compute Cloud (EC2) in 2006. Flexible software development methodologies such as Continuous Integration, lead to faster release cycles and makes the work of the Operations manager or Release coordinator a lot harder. He has to make sure these releases are correctly deployed onto the production environment. It is important that not only the software itself is tested, but also the behavior on this environment.

All of these changes led to a new breed of configuration management tools for automating the infrastructure. These tools use the Infrastructure as Code paradigm, that models your infrastructure in code, by breaking it down into components that interact with each other. In this context, Infrastructure isn’t anything physical (or virtualized) like servers or networks, but is the collection of components that are used to deliver your service or product to the end-user: configuration files, software packages, user accounts, processes, ...

infrastructure components

Every component type is abstracted just like an abstract class, that allows you describe the end-state of instances of that type. For example, a software package object will have an install, upgrade and remove method. This means administrators do not have to worry about specific implementation details anymore, and can now focus on the functionality and cooperation of abstracted components. A blueprint is designed that will be executed by a provider on the actual device. The provider takes care of the method's implementation and will translate operations to commands for the underlying operating system. The code becomes operating system independent; a provider for a Debian operating system will translate the install method of a software package to the apt-get command, and one for CentOS will translate it to the yum command.

All the method implementations are idempotent, which means that they can be applied multiple times without changing the result beyond the initial application. If you would execute the install method of a software package multiple times, it would only be installed once. These idempotent operations allow a non-optimal execution order, whereby the desired system end-state is reached by convergence).

Periodically running these idempotent operations results in auditing; if for some reason, a component deviates from it's described end-state, the code will restore the component (and the system).

infrastructure auditing

Because the infrastructure is modeled in code, we can use the same tools and principles as we would in modern software development. Using Version Control Systems like git, we introduce versions of the infrastructure that can be maintained alongside the application code. If a new unstable version breaks something, the infrastructure can be reverted to a previous stable version. You can also easily spin up an exact (virtual) duplicate of your production server to test your software application before deploying it.

> git log
d1b89d90ec07e08697acada12cec79f3b812e01c Updated ruby version
9256488f721eccdc245059edb9ff9b2b51aea24b Added SSL to nginx template
9b72807027f96bbbb929207b5bd8a92becfeb889 Install MariaDB and PhpMyAdmin
5f56e2e4580a274b4f018d75f3d124a48a5b0f1d Added php5-cli to installed packages

By describing all components and their interaction with each other, your whole infrastructure becomes documented. This is especially useful when multiple people are involved in the management of the infrastructure.

The tools

infrastructure tools

The big three of open-source configuration management software/tools are CFEngine, Puppet and Chef. The answer to "which tool should I use?" is actually "yes". A lot of blogs compare these tools and suggest you to use one of them based on a likely outdated argument, but in reality, it is not important which tool you use, as long as you use one. The main competitor of any of these tools remain the in-house created bash scripts. Maintaining all of these scripts for every combination of role, hardware and operating system is such a time consuming task, and these tools will make your work so much easier.

The main advice I would give anyone that wants to look into these tools would be to go for either Puppet or Chef. The open-source version of CFEngine feels more like a framework than a fully featured tool. In the end, I went for Chef because it met the requirements of my use-case, but it is possible that I will give Puppet a second try in the near future.


Comments

Jens Segers 5 months ago

@Charlene, unfortunately, my paper was in dutch.


Charlene 5 months ago

Hi, I'm also looking into infrastructure automation for my degree. Would you mind sharing your thesis paper? I would love to read it :-)


Jens Segers 1 year ago

@Daniel, do you mean machines with different operating systems? In that case yes, you need a package repository that has the specified version for each operating system or build it manually.


Daniel 1 year ago

One thing I don't understand is how idempotency is ensured across virtual machines provisioned with the same recipes at different times. For example if I have a recipe which installs php with "package php" how can I be sure that any machine provisioned with this recipe at different times will have the same package version? As far as my research went the only way to ensure this in an auto-scaling machine stack is to compile the desired version of the software (and maybe create an image from that for later use) or have your own package repository (hard to maintain imo).


Jens Segers 1 year ago

@Dieter_be, of course! I did not want to write yet another Chef vs. Puppet vs ... blog post, and neither did I suggest you should just pick one of these tools ad random. I want people to look into these tools and discover their advantages, instead of creating and maintaining bash scripts etc.


Dieter_be 1 year ago

The answer to "which tool should I use?" is actually "yes". A lot of blogs compare these tools and suggest you to use one of them based on a likely outdated argument, but in reality, it is not important which tool you use, as long as you use one

this seems to be a very commonly repeated advice. personally I disagree. puppet and chef (and salt, ansible) (and hence also the modules and recipes) have different philosophies and implementations and that definitely has consequences for the user (you, the sysadmin). also their feature sets have somewhat diverged.


mestachs 1 year ago

Another effect of infrastructure-as-code : that kind of code deserve also our coding best practices. IDE, Issue tracker, Pull request / Code reviews, Unit Testing, Integration testing, Continuous deployment,... whatever the tool you pick ;)

https://speakerdeck.com/mestachs/how-do-you-puppet


Jens Segers 1 year ago

I did come across Salt during my research, as well as some others like Ansible etc. But I did not really look into it as I was more focused on Puppet and Chef for their extensive documentation.


tchap 1 year ago

Have you researched Salt Stack as well? It's been getting a lot or traction recently I think...