Adventures in the Gmail PubSub API

Or, ‘wait, that’s not what the docs say…’

Thursday, Dec 8th, 2016

Mixmax is a communications platform that brings professional communication & email into the 21st century.

This blog post is part of the Mixmax 2016 Advent Calendar. The previous post on December 7th was about rewriting 30,000 lines of code.

A few months back, we started using the Gmail PubSub API (part of the broader Google PubSub API).

It pushes notifications to an endpoint in our system whenever a user’s inbox changes. This includes new messages arriving, as well as other events, such as a message being read, or moved to a different folder.

Our experience with it has been largely positive, however we did encounter a couple of gotchas that we haven’t seen documented anywhere yet.

1. It’s possible to subscribe to a user’s notifications more than once

The watch method (which subscribes to notifications regarding a user’s inbox) is supposed to be idempotent. While that’s broadly the case, we found that if you send multiple watch requests simultaneously, you end up with multiple subscriptions, meaning every event for that user gets pushed to your endpoint multiple times!

Our system was swamped by this a few days after deploy, reaching over 1000 requests per second before we shut it off.

(We were calling watch simultaneously in some cases due to the way events propagated through our internal queueing system - duplicates sometimes occur, but we didn’t protect against them because the method was documented as idempotent. We've since added a user-level lock to prevent this).

2. The messagesAdded collection in history.list isn’t reliable

The push notifications from Gmail contain a historyId - you then need to query the API to get all changes which have occurred between the last historyId you saw for that user, and this new historyId.

The history.list method returns an array of changes, broken down by change type. For example, labelsAdded contains details of labels being applied to messages. We were mostly interested in the messagesAdded array, which according to the documentation, represents:

Messages added to the mailbox in this history record

We built our code assuming that all new messages would be listed within messagesAdded.

However, inexplicably, that’s not always the case. We had multiple instances where new messages simply never appeared in the messagesAdded array.

We were never able to identify why this occurred. In our tests (moving a message out of the spam folder, messages sent from the user to themselves, messages skipping the inbox etc.) we could not reproduce the issue.

We did, however, find that when messages were missing from messagesAdded, they did appear in the messages array, which holds the ids of all messages modified in any way in the history record.

We now check the messages array exclusively, which is inefficient, because it means we’re querying messages which aren’t new, but were simply changed in some way (read, moved to a different folder, archived etc).

But at least we can now guarantee we’re picking up all new messages.

3. If you subscribe to push notifications for all your users, then sending out a user email blast causes some serious spikes

We noticed a weird spike in push notifications (around 10x higher than normal volume), a few days after launch.

It took a few minutes to figure out it was due to a newsletter we had sent out to our entire user base, triggering inbox events for every user simultaneously :)

Incidentally, we've also noticed that inbox events spike on the hour and half-hour. Our theory is that this is due to bulk marketing emails, which are often scheduled to go out at these times.

Enjoy discovering API quirks like these? Drop us a line.