Batching CloudWatch metrics

Reduce your AWS bill by $5k

Tuesday, Jan 3rd, 2017

Mixmax is a communications platform that brings professional communication & email into the 21st century.

tl;dr - We saw a noticeable decrease in our AWS bill by batching CloudWatch metrics.

Insight into all aspects of our company is core to Mixmax, as such we love our metrics. We have metrics all the way from product right through internal development. We put metrics on everything. One place we store and inspect a lot of our engineering metrics is AWS CloudWatch as it allows us to seamlessly integrate metrics into our alerting and monitoring system.

CloudWatch Metrics

CloudWatch data metrics are awesome because you can add extra dimensions to them. This gives you the ability to segment them visually in the CloudWatch dashboard and to build highly granular alerts. Since the AWS CloudWatch API is also pretty easy to use, we can programmatically build alerts and dashboard when we deploy the gathering of a new data metric to AWS.

Why do we need to batch them?

For a long time, we sent metrics to AWS as soon as they happened and everything was happy. As we began to scale ever and ever larger however, we found that there was a default 150 put-metric-data calls per second rate limit, so we decided to batch our requests. We didn’t want to have to jump through any hoops in modifying our code to do this when sending requests to CloudWatch, so we open sourced a super easy to use Node module for batching these put-metric-data requests: cloudwatch-metrics.

Initializing cloudwatch-metrics

By default, the library will log metrics to the us-east-1 region and read AWS credentials from the AWS SDK's default environment variables. If you want to change these values, you can call initialize:

var cloudwatchMetrics = require('cloudwatch-metrics');
cloudwatchMetrics.initialize({
    region: 'us-east-1'
});

Creating metrics

Creating a metric is pretty basic, we simply need to provide the namespace and the type of metric:

var myMetric = new cloudwatchMetrics.Metric('namespace', 'Count');

We can also add our own default dimensions:

var myMetric = new cloudwatchMetrics.Metric('namespace', 'Count', [{
    Name: 'environment',
    Value: 'PROD'
}]);

The metric constructor also accepts a set of optional arguments to control: whether we actually send the metric (useful for dev environments), a callback in case a request to CloudWatch fails, the default interval to wait before sending metrics and a max capacity of events to buffer before we send to CloudWatch (useful if you’re buffering a lot of events in a bursty fashion).

var myMetric = new cloudwatchMetrics.Metric('namespace', 'Count', [{
    Name: 'environment',
    Value: 'PROD'
}], {
    sendCallback: (err) => {
        if (!err) return;
        // Do your error handling here.
    },
    enabled: true, // Set to false if you don't want to send data (i.e. a dev environment)
    maxCapacity: 30, // The default value is 20
    sendInterval: 3*1000 // The default value is five seconds and is in milliseconds
});

Sending metrics to CloudWatch

Sending data for a metric to CloudWatch is then extremely simple:

myMetric.put(value, metric, additionalDimensions);

The only to keep in mind is that data is sent asynchronously to the server, so when this function is called it will not immediately send data to CloudWatch. It will wait for the sendInterval to expire or for the maxCapacity to be reached, whichever happens first.

Why you should batch your CloudWatch metrics

As we said before, there is a rate limit on how many data points you can send for your CloudWatch metrics. You can of course have this changed, but as you’re also charged per put-metric-data request you can save a lot of $$$ by batching your requests. In fact, we saw a very noticeable decrease in our month AWS bill! The one constraint to keep in mind with this, is that POSTing put-metric-data calls to AWS CloudWatch is capped at 40KB per request, so there is a limit to how large a request can be.

Enjoy simplifying infrastructure costs without jumping through hoops? Drop us a line.