This blog post is part of the Mixmax 2017 Advent Calendar. The previous post on December 3rd Handling 3rd-party JavaScript with Rollup.
tl;dr - We use Terraform for almost everything and we're never looking back.
The problem
Have you ever tried to navigate around the AWS UI to hunt down a configuration issue? Perhaps someone accidentally clicked the wrong button, and suddenly one of your high throughput Elasticache redis deployments is downsizing back down to a t2.medium
just cause... This isn't the fault of you or your team, you can't blame them for being overwhelmed by the sheer amount of UI there is in the AWS web console.
I'm not even going to talk about the hours upon hours it took me to reorient myself with the AWS services dropdown when it was reorganized earlier this year. However, managing infrastructure is hard, and if you're managing it through UI can you really expect it to get any easier? Absolutely not. The sheer amount of knobs and dials you can turn when configuring any system is not only overwhelming, but it makes it easy to miss a confirmation modal or to not realize that there wasn't one. This means that configuration mistakes aren't only very simple to make, but you're only one click away from a mistake that you might not ever notice (such as making an S3 bucket publicly available).
Is it all hopeless?
Fear not! Configuration can also be done with, well you know, configuration files. But why stop there? Why not drink a little more of the Kool-Aid and begin to version your infrastructure? While "Infrastructure as code" can seem terrifying and daunting to implement, we're here to tell you that it's very easy to incrementally roll out across your infrastructure.
Side note: What is Infrastructure as code?
Infrastructure as code, technically, means configuring infrastructure with an automated system instead of configuring it manually. So instead of manually going to the AWS UI and clicking some buttons, or instead of hopping on a server and fiddling with some config files, you make changes to machine readable files that your automated system can then use to apply those changes for you. The utility of such a system becomes very apparent when a few additional lines of configuration code can be used to modify your entire server fleet.
Moving from a manually managed system to a fully automated one can seem daunting because it can be incredibly difficult to identify how to even begin the migration process. Not only that, but it can be difficult to find a low stakes environment in which to begin to test the waters without committing your entire infrastructure to the new process.
Infrastructure as code: start with the little pieces
At Mixmax, we use many of AWS's services - from Elastic Beanstalk and Elasticache all the way through CloudWatch and DynamoDB. We knew it wouldn't be feasible to move our entire world to a versioned configuration system in one fell swoop, so we wanted to use a tool that would allow us to incrementally bring our infrastructure under version control. For us, this meant that Hashicorp's Terraform was a no brainer as we could easily begin to use Terraform to manage small deployments of non-application level systems before committing to managing our application services with Terraform. In order to incrementally migrate to using Terraform, we began to move components of our infrastructure that were fairly static to be under Terraform's control. There are many other tools in this space, but most are primarily application configuration systems that were retroactively bootstrapped in order to also be used as provisioning tools whereas Terraform has been a flexible provisioning tool from the start.
First we moved our CloudWatch alarms and our SNS topics and subscriptions to be controlled via Terraform. Using Terraform modules for this was so successful that engineers who previously never wanted to touch CloudWatch alarms began to create them with glee! We'd turned a painful part of our development process into something our team found to now be a joy to work with. After that success, we decided to try something with higher stakes, and so we moved our Elasticache redis deployments to be provisioned and managed via Terraform. Again, using Terraform modules made this a breeze.
Why do terraform modules make this so simple? Well let's look at an example. We utilize CloudWatch alarms across our entire infrastructure in many different applications, but one specific one is tracking the number of delayed, inactive and failed jobs in our job queueing system, bee-queue. Before, engineers would have to either manually make alarms in the AWS UI or run a script that wasn't fully intuitive to use. More than once, we'd ended up with only two of the alarms existing, the third having been forgotten. With Terraform modules though, creating three alarms is super simple:
|
It's really that simple though - one segment of code for three alarms! How does this work though? Well, let's look the structure of the job_queue_alarms
folder.
job_queue_alarms/
main.tf
delayed/
main.tf
failed/
main.tf
inactive/
main.tf
The root main.tf
in the job_queue_alarms
directory then looks like:
variable "alarm_name" {}
variable "ok_action" {}
variable "alarm_action" {}
variable "delayed_threshold" {
default = "100.0"
}
variable "failed_threshold" {
default = "100.0"
}
variable "inactive_threshold" {
default = "100.0"
}
module "too-many-failed-jobs-bee-queue" {
source = "./failed"
alarm_name = "${var.alarm_name}"
ok_action = "${var.ok_action}"
alarm_action = "${var.alarm_action}"
threshold = "${var.failed_threshold}"
}
module "too-many-delayed-jobs-bee-queue" {
source = "./delayed"
alarm_name = "${var.alarm_name}"
ok_action = "${var.ok_action}"
alarm_action = "${var.alarm_action}"
threshold = "${var.delayed_threshold}"
}
module "too-many-inactive-jobs-bee-queue" {
source = "./inactive"
alarm_name = "${var.alarm_name}"
ok_action = "${var.ok_action}"
alarm_action = "${var.alarm_action}"
threshold = "${var.inactive_threshold}"
}
While each main.tf
inside one of the children directories, looks like:
|
Phew! There's a lot going on here! The general gist of this though is that through using variables, we can create reusable components that we can then combine to create multiple resources at a time! In our previous example of using the job_queue_alarm
module, we used the default threshold values, what if we wanted to use custom threshold values? In that case, we'd do something similar to this:
module "process-cool-event-job-queue-alarms-bee-queue" {
source = "./modules/job_queue_alarms"
alarm_name = "process-cool-event"
ok_action = "${var.high-priority-ok-action}"
alarm_action = "${var.high-priority-alarm-action}"
delayed_threshold = "500.0"
failed_threshold = "50.0"
inactive_threshold = "120.0"
}
Et voila! By using variables with default values, we can provide overriding values to the module at any time, allowing for a very high degree of control over otherwise very similar resources.
But wait there's more!
As we began to use Terraform for more and more across our AWS infrastructure, we realized Terraform can be used to provision and configure anything as long as there's a Terraform provider for it. Giddy with excitement, we began to quickly Terraform our PagerDuty schedules and service alarms! While on the surface this seems excessive, it has huge benefits. By Terraforming our PagerDuty alarms, we're able to create brand new alarms for new services in CloudWatch at the same time that we make those new alarms in PagerDuty - meaning that we can programmatically connect them, all at the same time!
What should I take away from this?
Infrastructure as code is incredible, but you shouldn't feel like you have to migrate the world all at once. We've found that by incrementally moving our infrastructure to a versioned provisioning system, we've had not only widespread adoption internally but also an increase in interest in getting involved with infrastructure work. At Mixmax, we're not using Terraform for everything yet, but we're enjoying the process of seeing how it's making everyone's lives easier while we continue to roll its usage out across our systems.
Enjoy building smarter infrastructure in an intelligent way instead of wrangling the AWS UI? Drop us a line.