Beating Spam Detection Bots

Friday, Dec 8th, 2017

Mixmax is a communications platform that brings professional communication & email into the 21st century.

This blog post is part of the Mixmax 2017 Advent Calendar. The previous post on December 7th was Precisely observing structural page changes.

One way that Mixmax revolutionizes your email is by enabling you to schedule meetings instantly without back and forth messages.

Here's what our meeting picker looks like:

Mixmax Meeting Picker

When a recipient receives this email, all they have to do is click one of the available timeslots, and the meeting will be scheduled instantly with automatic double-booking protection.

What Could Go Wrong?

The downside of being able to schedule meetings with a simple click is that anyone (or anything) can do it. We've noticed that some recipients have installed spam detection bots in their email client that will click every link in the email to check that the links are safe. As a result, it seems like the recipient tries to "book" every meeting timeslot, causing a lot of confusion.

A Simple Solution

What's a simple way to detect if a bot tries to click all your meeting links at once? By rate-limiting the number of requests!

We set the limit on the number of meetings a recipient could schedule to 4 per 30 seconds. We thought this would be good enough to prevent bots from scheduling every timeslot in the email.

const limiter = require('express-limiter');

limiter({
  path: '/api/scheduleMeeting',
  method: 'get',

  // 4 requests per 30 seconds.
  total: 4,
  expire: 30 * 1000
});

app.get('/api/scheduleMeeting', require('./scheduleMeeting'));

What Could Go Wrong? Part 2

We encountered several issues with this simple rate-limiting approach:

  1. If the recipient tries to schedule a meeting shortly after receiving the email, they will be blocked for up to 30 seconds because their spam bot caused them to exceed the rate-limit.

  2. We noticed that bots were still able to get around our rate-limiting, resulting in multiple meeting confirmations before the recipient had actually clicked a timeslot.

A Better Solution

One way to raise the bar for bots is to require that they evaluate client-side Javascript in order to schedule a meeting. Here’s how it works:

When a meeting request comes in following a click, we don't immediately schedule the meeting. Instead we send back a fake response consisting of some simple HTML. Inside the HTML, we include a couple lines of Javascript telling the browser to automatically resend the request with a special query param.

Here's what it looks like:

function redirectMiddleware(req, res, next) {

  // Client has successfully evaluated Javascript and redirected back to this
  // url with the special query param. Proceed with scheduling the meeting.
  if (req.query['specialParam'] === 'true') return next();

  return res.send(`
    <html><body>
      <script type="text/javascript">
        const query = window.location.search;
        window.location.href += (query ? '&' : '?') + 'specialParam=true';
      </script>    
    </body></html>
  `);
}

app.get('/api/scheduleMeeting', redirectMiddleware, require('./scheduleMeeting'));

This solution works because the user's browser will load the DOM and execute the Javascript inside the HTML response while most bots won't. If the request was initiated by the recipient clicking a timeslot, their browser will execute the Javascript and resend the same request, except this time with our special query param.

Upon seeing this special query param in the url the second time around, we schedule the meeting for the recipient.

The Result

Even though we have to make a second request for every meeting request, this only occurs upon clicking a link and the delay is unnoticeable.

Since adding this check, we've seen a decrease in customer issues about duplicate meeting confirmations and we were able to relax the rate-limit from 4 per 30 seconds to 1 per second, improving the user experience.

Generally, distinguishing between bots and a real browser involves focusing on capabilities that the browser has that most bots do not. In our case, we relied on the browser's ability to load the DOM upon receiving an HTML response from our server to verify that the browser initiated the request. We look forward to seeing how you guard against spam bots in your product!

Interested in working on the communication platform of the future? Email us at hello@mixmax.com and follow us @Mixmax