Adventures in the Gmail PubSub API

SHARE ON

This blog post is part of theMixmax 2016 Advent Calendar. The previous post on
December 7th was about
rewriting 30,000 lines of code.

A few months back, we started using the Gmail PubSub API (part of the broader Google PubSub API).

It pushes notifications to an endpoint in our system whenever a user’s inbox changes. This includes new messages arriving, as well as other events, such as a message being read, or moved to a different folder.

Our experience with it has been largely positive, however we did encounter a couple of gotchas that we haven’t seen documented anywhere yet.

1. It’s possible to subscribe to a user’s notifications more than once

The `watch` method (which subscribes to notifications regarding a user’s inbox) is supposed to be idempotent. While that’s broadly the case, we found that if you send multiple `watch` requests simultaneously, you end up with multiple subscriptions, meaning every event for that user gets pushed to your endpoint multiple times!

Our system was swamped by this a few days after deploy, reaching over 1000 requests per second before we shut it off.

(We were calling `watch` simultaneously in some cases due to the way events propagated through our internal queueing system – duplicates sometimes occur, but we didn’t protect against them because the method was documented as idempotent. We’ve since added a user-level lock to prevent this).

2. The `messagesAdded` collection in `history.list` isn’t reliable

The push notifications from Gmail contain a `historyId` – you then need to query the API to get all changes which have occurred between the last `historyId` you saw for that user, and this new `historyId`.

The `history.list` method returns an array of changes, broken down by change type. For example, `labelsAdded` contains details of labels being applied to messages. We were mostly interested in the `messagesAdded` array, which according to the documentation, represents:

> Messages added to the mailbox in this history record

We built our code assuming that all new messages would be listed within `messagesAdded`.

However, inexplicably, that’s not always the case. We had multiple instances where new messages simply never appeared in the `messagesAdded` array.

We were never able to identify why this occurred. In our tests (moving a message out of the spam folder, messages sent from the user to themselves, messages skipping the inbox etc.) we could not reproduce the issue.

We did, however, find that when messages were missing from `messagesAdded`, they did appear in the `messages` array, which holds the ids of _all_ messages modified in any way in the history record.

We now check the `messages` array exclusively, which is inefficient, because it means we’re querying messages which aren’t new, but were simply changed in some way (read, moved to a different folder, archived etc).

But at least we can now guarantee we’re picking up all new messages.

3. If you subscribe to push notifications for all your users, then sending out a user email blast causes some serious spikes

We noticed a weird spike in push notifications (around 10x higher than normal volume), a few days after launch.

It took a few minutes to figure out it was due to a newsletter we had sent out to our entire user base, triggering inbox events for every user simultaneously 🙂

Incidentally, we’ve also noticed that inbox events spike on the hour and half-hour. Our theory is that this is due to bulk marketing emails, which are often scheduled to go out at these times.

Enjoy discovering API quirks like these? Drop us a line.

SHARE ON

Written By

Cameron Price-Austin

Cameron Price-Austin

From Your Friends At