Migrating production services to AWS Elastic Beanstalk without downtime

Friday, Oct 23rd, 2015

Mixmax is a communications platform that brings professional communication & email into the 21st century.

Mixmax started out as a monolithic Meteor application hosted on a PaaS provider (Modulus.io) that specialized in deploying NodeJS and Meteor applications. As our traffic and userbase grew, we quickly ran into scaling problems with the Meteor framework, poorly optimized prototype code, and the limits of our cloud hosting provider. We’ve blogged about some of our scaling projects previously: Scaling Mixmax: Front-end performance, Unicode woes in Javascript, and more. In addition to scaling difficulties, there were features that we were looking for that Modulus.io didn’t offer at the time. We investigated other PaaS providers like Heroku, but eventually decided that moving to AWS would give us the best feature set immediately and put us in the right direction moving forward as well.

Working with AWS isn’t easy. Even reading the developer documentation can be headache inducing. Even if you are experienced with AWS, you still have dozens of options to consider for server configuration and deployment. Any googling of the problem space quickly devolves into madness. Read CircleCI’s it’s the future for a taste of how deep the rabbit hole goes.

In the end we decided to use AWS Elastic Beanstalk, a PaaS offering built by AWS. Elastic Beanstalk handles configuring the servers, deploying code, and provides a nice dashboard UI to manage your projects. The dashboard isn’t as easy to use as other PaaS offerings, but the other features of Elastic Beanstalk are very compelling. Here is a short list of the features offered by Elastic Beanstalk that we found to be superior to our current PaaS provider.

  • integrated CloudWatch monitoring
  • zero-downtime deploys (batched deploy groups)
  • more flexible CPU/memory offerings
  • reliability of load balancer (Elastic Load Balancer)

Mixmax is a real-time communications product with users all over the world. We knew from the start that this migration should be done in stages in minimize downtime of crucial systems. The plan for migration was simple:

  • Step 1: Bring up an identical AWS environment
  • Step 2: Change the DNS records from the Modulus.io hosts to the AWS Elastic Load Balancer.
  • Step 3: After DNS propagation completes, turn off the old environment.

Step 1: Bring up an identical AWS environment

The first step was to bring up an AWS environment side-by-side with our existing production environment. The basic outline was as follows:

  • configure Elastic Beanstalk environment
  • configure build and deployment tools

Elastic Beanstalk’s setup wizard is probably one of AWS’ best offerings to date. Under the hood the setup wizard configures and launches EC2 instances, configures an ELB, and sets up basic CloudWatch alarms. Elastic Beanstalk also offers a CLI tool similar to Heroku’s for even faster environment setup.

Our services have staging and production environments that get deployed automatically based on commits to the corresponding Github repo. Fortunately our CI provider Codeship supported multiple deployment destinations, though we found Codeship’s Elastic Beanstalk default integration wasn’t great. It required a new S3 bucket to store the to to be deployed, and along with it a custom IAM policy that bridges S3 and Elastic Beanstalk. We opted to use the Elastic Beanstalk’s CLI tool directly in our deploy script instead.

After ensuring that Elastic Beanstalk and Codeship configurations were correct and worked as expected we were able to proceed to the next step.

Step 2: Switch the DNS records

Most of our services expose an HTTP REST API. Flipping the switch between the old environment and new environment was as simple as changing the DNS records. One lesson we learned the hard way, when setting up the DNS record for your Elastic Beanstalk project, use the Elastic Beanstalk URL and not the Elastic Load Balancer URL. Each Elastic Beanstalk project exposes a rebuild environment action that will tear down every EC2 instance along with the ELB. If you use the Elastic Load Balancer CNAME URL and someone accidentally clicks the rebuild environment button, then your project will be unreachable until your DNS records propagate the CNAME for the new ELB.

Step 3: Turn off the old environments after DNS propagation

People use Mixmax from all over the world and DNS propagation to remote location in the world may not happen quickly. Our DNS records were had a TTL of 3600 seconds, but we were still recording traffic on our old environments for well over an hour after the DNS change. We opted to leave the old environments running for 24 hours after the DNS change. Running two environments for 24 hours wasn’t an engineering burden because our deployment process automatically pushed to both environments.

After the dust settled

As much as two PaaS offering can appear similar on paper, in reality the differences are amplified far more than we could have expected. Some of the advantages offered by Elastic Beanstalk are game changing. Accurate minute-by-minute monitoring offered by CloudWatch has provided us with so much more transparency into how our application code runs in production. CloudWatch Alarms are insanely customizable and some quick configuration resulted in less false positive Pager Duty incidents. Zero-downtime deploys have become crucial as our traffic increases.

After this migration, the majority of our microservices are hosted on Elastic Beanstalk. We still use Modulus.io to host our Meteor application because Elastic Beanstalk doesn’t offer a turnkey solution for SSL, websockets, and session affinity. We learned a lot from this migration and going forward we’re confident we can migrate again if needed.

Tweet at us @Mixmax with your opinions on deploying with AWS vs other PaaS/IaaS offerings. If you enjoy working on a real time communication product with a global userbase, drop us a line at careers@mixmax.com. We’re based in San Francisco, CA and are looking for engineers who care about distributed, real-time systems and building a product for the global community.