Upgrading to Node 6 on Elastic Beanstalk

And speeding up npm install by 95%!

Friday, Dec 2nd, 2016

Mixmax is a communications platform that brings professional communication & email into the 21st century.

This blog post is part of the Mixmax 2016 Advent Calendar. The previous post on December 1st was about Mixmax’s open-source culture.

In case you haven’t heard, Node 6 went LTS mid-October, with AWS Elastic Beanstalk adding support at the end of the month. Since Node 6 promised support for 99% of ES6 features as well as a host of performance and security improvements, we moved quickly to adopt it. We found it to be very easy to upgrade locally—we only had to upgrade a few native dependencies to their latest version to pick up new bindings, and did not have to change any code. Kudos to the Node Foundation for a stable release and the community for embracing Node 6 well in advance of LTS.

It was not as easy to upgrade Elastic Beanstalk, however—upgrading the platform version persistently resulted in stuck deploys and rollbacks. Debugging this required exploring Elastic Beanstalk’s inner workings, but we ultimately made fixing it as simple as installing a Node package. And not only did that package enable us to upgrade Elastic Beanstalk to Node 6, but it also sped up npm install by 95%. Here’s how we did it.

What went wrong

We initially tried to upgrade the platform version in place using the upgrade button on our application’s dashboard. But the configuration deploy never finished—boxes would just time out. We then cloned the environment with the latest platform. This initially succeeded, only for further deploys to fail.

Watching these deploys fail was agonizing. Elastic Beanstalk’s dashboard gives very little insight into what’s going on during a deploy. But you can easily SSH into the EC2 instances and tail the deployment logs. Using the EB CLI tool:

eb ssh -i <instance id>
tail -f /var/log/eb-activity.log

This revealed that the boxes were getting stuck running npm install:

[2016-12-02T01:17:44.287Z] INFO  [27173] - [Application update app-91d6-161202_011650-stage-161202_011650@28/AppDeployStage0/AppDeployPreHook/50npm.sh] : Starting activity...

We were perplexed. EB’s docs said that platform 3.1.0 was using npm 2.15.5, same as the previous platform. What was the difference?

We quickly suspected that EB’s docs were wrong, since Node 6.9.1 usually ships with npm 3.10.8. We confirmed this on an EC2 instance:

[ec2-user@ip-10-20-4-104 ~]$ export PATH=/opt/elasticbeanstalk/node-install/node-v6.9.1-linux-x64/bin:$PATH
[ec2-user@ip-10-20-4-104 ~]$ /opt/elasticbeanstalk/node-install/node-v6.9.1-linux-x64/bin/npm -v
3.10.8

(Note: we reported this to AWS on 11/11/2016, but as of 12/01/2016 the docs are still wrong 😞.)

We upgraded to npm 3 locally and timed npm install in several of our projects. We found that npm 3.10.8 consistently takes about 2x longer to run npm install than npm 2.15.5. And on the resource-constrained EC2 instances, it was taking even longer—much longer than the command timeout. Cloning the environment appeared to fix the problem only because EB uses a longer timeout when creating an environment than when deploying configuration changes to an existing environment.

So the fix was going to involve downgrading npm 3 to npm 2… on every EC2 instance, across all of our services, whenever Elastic Beanstalk deployed to a new instance. How could we automate this?

ebextensions

Luckily, Elastic Beanstalk offers a way to hook into its deploy process: by adding configuration files to a folder named .ebextensions, you can add scripts for EB to run during deploy and even overwrite its default scripts.

This let us make an ebextension file that would install an npm-downgrading script. (Note: these intermediate scripts are for illustration, not use, since the ultimate set of scripts is way better.)

# EB runs deploy scripts in alphabetical order http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/ebextensions.html,
# Node is installed using a script called "40install_node.sh", and `npm install` is
# run using a script called "50npm.sh", so we downgrade Node in a script called
# "45npm_downgrade.sh".


files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/45npm_downgrade.sh":
    mode: "000755"
    owner: root
    group: users
    content: |
      #!/usr/bin/env bash

      EB_NODE_VERSION=$(/opt/elasticbeanstalk/bin/get-config optionsettings -n aws:elasticbeanstalk:container:nodejs -o NodeVersion)

      # Make sure Node binaries can be found (required to run npm).
      # And this lets us invoke npm more simply too.
      export PATH=/opt/elasticbeanstalk/node-install/node-v$EB_NODE_VERSION-linux-x64/bin:$PATH

      if [ $(npm -v) != "2.15.9" ]; then
        echo "Downgrading npm to 2.15.9..."
        npm install npm@2.15.9 -g
      else
        echo "npm already at 2.15.9"
      fi

But now we had the challenge of distributing this file across our 14 microservices. Were we going to copy-and-paste it? No way!

install-files

Awhile back, we made an npm package precisely to solve the problem of distributing files like this ebextension. The package is called install-files, and what it does is allow another package to install files into its host package’s directory.

Let’s say that my-microservice installs the eb-fix-npm package. The eb-fix-npm package can then call install-files source from an install script to copy the contents of source into my-microservice:

This tool lets you share files between Node projects the same way you would share code, using npm and declarative package names/versions. And with the ability to quickly distribute changes to the script, we got ambitious.

Speeding up npm install

Simply by downgrading to npm 2, we were able to upgrade our Elastic Beanstalk environments to Node 6. But, in the process of investigating Elastic Beanstalk’s npm script, we noticed several inefficiencies.

First, it installed Node modules afresh on every deploy. So we introduced a cache:

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/46cache_node_modules.sh":
    mode: "000755"
    owner: root
    group: users
    content: |
      #!/usr/bin/env bash
      # Cache Node modules in /var.

      if [ ! -d "/var/node_modules" ]; then
        mkdir /var/node_modules ;
      fi
      ln -s /var/node_modules /tmp/deployment/application/

By comparing timestamps when tailing EB’s activity log, we could see that EB’s npm script went from taking ~4m to ~1m: a 75% speedup.

Then we noticed that EB was calling npm rebuild after installing. But modules are automatically built for the appropriate architecture when installing! The only time you need to rebuild is when the architecture changes—on configuration deploy. And on configuration deploy, EB was trying to install new modules—even though package.json doesn’t change on configuration deploy, only on application deploy.

So, no npm rebuild on application deploy, and no npm install on configuration deploy:

files:
  "/opt/elasticbeanstalk/env.vars":
    mode: "000775"
    owner: root
    group: users
    content: |
      # Exports variables for use by the other scripts below.

      EB_NODE_VERSION=$(/opt/elasticbeanstalk/bin/get-config optionsettings -n aws:elasticbeanstalk:container:nodejs -o NodeVersion)
      export PATH=/opt/elasticbeanstalk/node-install/node-v$EB_NODE_VERSION-linux-x64/bin:$PATH

  "/opt/elasticbeanstalk/hooks/appdeploy/pre/50npm.sh":
    mode: "000755"
    owner: root
    group: users
    content: |
      #!/usr/bin/env bash
      #
      # Note that this *overwrites* Elastic Beanstalk's default 50npm.sh script.

      . /opt/elasticbeanstalk/env.vars

      cd /tmp/deployment/application && npm install --production

  "/opt/elasticbeanstalk/hooks/configdeploy/pre/50npm.sh":
    mode: "000755"
    owner: root
    group: users
    content: |
      #!/usr/bin/env bash
      #
      # Note that this *overwrites* Elastic Beanstalk's default 50npm.sh script.

      . /opt/elasticbeanstalk/env.vars

      cd /tmp/deployment/application && npm rebuild --production

During application deploy, our replacement npm script now took ~10 seconds: a 95% speedup compared to the initial duration.

Bonus: EB’s configuration deploy npm script doesn't actually do anything—it uses the wrong working directory. Our script actually rebuilds your modules if, for instance, you change your Node version.

One package to fix everything

So there you have it: you can unblock upgrading your Elastic Beanstalk environments to Node 6 and virtually eliminate npm install time by installing a single package, eb-fix-npm. After installation, you’ll effectively only have to npm install when Elastic Beanstalk spins up new EC2 instances, without the cache. But we think we have a way to get rid of this hiccup too. Stay tuned…

Like working on the cutting edge of JavaScript devops? Join us!