Precisely Observing Structural Page Changes

This blog post is part of the Mixmax 2017 Advent Calendar. The previous post on December 6th was about Database-backed job processing.

Mixmax is built on Gmail. Our product, and its convenience and power, depends on tight user-interface integration with Gmail. In order to add features to Gmail for our users, we need to track the structure of Gmail’s DOM and be able to manipulate it. For years, we achieved this by crafting query selectors to identify important elements within Gmail’s DOM, and continuously re-applying these selectors as the page changed. As we did this, we attached our own content to the page.

Sadly, many users previously reported performance problems with Gmail when Mixmax was installed. Our performance analysis showed significant processing time being spent in the code that observed the page for changes. This code was thus a candidate for optimization.

Problem

Our performance analysis revealed that our existing implementation configured MutationObserver instances to watch for changes anywhere within the document tree, causing major page slowdown. We were using this approach to observe page changes within Gmail without undue risk of breakage from changes in Gmail’s page structure. To combat the expected performance issues endemic to responding to every little change in the DOM, we wrapped our handler in _.throttle, thereby only responding to a small fraction of changes without losing correctness. Underscore’s throttle function returns a closure function which checks the elapsed time since the last call, and delegates to the provided function only when sufficient time has passed. In our case, we discovered that simply running this check was enough to induce low performance, as our mutation observers fired their handlers with such intensity that it overwhelmed poor V8.

We wanted to maintain functionality without the massive performance penalty. Optimally, we would (instead of watching all changes) identify nodes we expect to change and observe them directly.

We realized early on that a good solution must balance performance and reliability. If we made our declarations too specific, we would risk significant maintenance work to respond to Gmail changes, but if we make them too indirect, we risk missing crucial page updates that allow us to introduce our own controls into Gmail.

Solution

Instead of registering global MutationObserver instances, or leaning heavily on fragile, specialized code to watch the right elements for changes, we now make heavy use of page-parser-tree, a module written by Chris Cowan at Streak. page-parser-tree observes subsets of a dynamic webpage, using declarations provided by the developer that identify important sections of the page. The declarations consist of realtime watchers and polling finders, which pick out specific elements and add them to tracked sets of elements dubbed “tags.”

The watchers declare parts of the page structure that we know will change — things like the thread and message list containers, and the compose and reply buttons. Watchers are generally hierarchical, and reference other tags to explore smaller and smaller DOM subtrees.

The finders define functions that run periodically (usually every five seconds) to detect important elements the watchers might have missed. They do so by running query selectors against the entire document. The finders will discover the same elements as our old approach, but without the performance penalties associated with the global mutation observers. The finders thus serve two purposes: to ensure that our integration doesn’t break entirely if we miss an edge-case, and to provide fallback behavior in case Gmail’s page structure changes in a way that our watchers can’t accommodate.

Should the watchers and finders identify different sets of elements, we log the reported inconsistency and some context. These reports give us a channel to proactively identify and fix regressions related to Gmail updates.

Under the hood, page-parser-tree uses live-set, essentially a set with a subscribe method that tracks changes to some group of objects. These live-sets can be converted to Observable objects which have a slightly different subscribe methods with a different use-case. Observable objects are useful — see the example in the next section.

In addition to page-parser-tree, we have some specialized code to handle poorly supported edge-cases, and to achieve reliability unattainable by using page-parser-tree’s watchers. One example is a preview pane (a Gmail lab) agnostic thread navigation watcher, which emits events when the user changes the current thread. The next section includes a few other examples of this.

The new approach avoids the performance issue with global mutation observers by not using them. Profiling shows that page-parser-tree’s tricks make watching the page no longer a significant performance problem.

Implementation

We used to monitor changes to Gmail’s structure using global subtree MutationObserver instances. We had an ElementUtils module that provided onElementAdded and ensureElementExists functions to detect elements given a query selector. The onElementAdded utility called the given handler function when it noticed any element matching the query selector, whereas the ensureElementExists function returned a promise that resolved to the first matching element, including existing elements. ensureElementExists built on onElementAdded, unsubscribing after the first element. onElementAdded used _.throttle call to reduce the frequency with which it called the handler. The following code, for example, would detect the compose button:

const selector = GmailSelectors.COMPOSE_BUTTON;
// Delay watching for the compose button for long enough so the loading performance
// is quick, but also short enough so the button doesn't noticeably flash the default
// gmail color.
const wait = 300;
ElementUtils.ensureElementExists(selector, wait).then((origComposeButton) => {
  // replace the compose button
});

Internally, ElementUtils used a slightly more complicated variation of the following code:

// Find existing elements.
onMutation();
// Throttle element queries and handler calls.
const wrappedOnMutation = _.throttle(onMutation, throttleDuration, {leading: false});
const observer = new MutationObserver(wrappedOnMutation);
observer.observe(document, {
  childList: true,
  subtree: true
});
function onMutation() {
  const elems = $(selector);
  if (elems.length) {
    handler(elems, observer);
  }
}

The new code that detects the compose button proxies through a new common interface, called UI:

UI.get('originalComposeButton').then((origComposeButton) => {
  // replace the compose button
});

Under the hood, UI uses a getFirstFromTag utility function to get the first node from the originalComposeButton tag. The getFirstFromTag function asks the page-parser-tree instance for an Observable corresponding to that tag, and unsubscribes as soon as the observable produces a compose button. This code is roughly analogous to the following, but handles numerous edge-cases:

import toValueObservable from 'live-set/toValueObservable';
const Watcher = new PageParserTree(definitions);
function getFirstFromTag(tag) {
  const deferred = $.Deferred();
  // Get an observable for the given tag.
  const observable = toValueObservable(Watcher.tree.getAllByTag(tag));
  // Subscribe to elements in the tag - include elements already in the tag. The
  // value parameter is unpacked from an object that also contains the removal
  // Promise, which we don't need for this use-case.
  const subscription = observable.subscribe(({value}) => {
    subscription.unsubscribe();
    deferred.resolve($(value.getValue()));
  });
  return deferred.promise();
}

This new detection requires a deeper understanding of how the page changes. When the user simply loads their inbox, the compose button will be reachable once the loading view disappears. However, if they navigate to contacts, Gmail removes the compose button from the DOM. The compose button watcher must therefore rediscover the button when the page changes. To avoid an overly specific set of child selectors, we “jump” between well-known points in the DOM. The tag is defined as a watcher and associated finder:

watchers: [
  // The 'pageContent' source references another tag that finds a defined DOM node
  // that wraps the entire page content (minus things like top-level scripts and our
  // compose windows).
  {sources: ['pageContent'], tag: 'originalComposeButton', selectors: [
    // Use the $map operator to hop from the "pageContent" container to the left
    // sidebar container, which includes the dropdown that navigates between Mail
    // and Contacts, and is seven levels down from the "pageContent" container. Here,
    // the mapping function will be called with each element from the pageContent tag
    // (which should be a single element).
    {$map: (e) => $(e).find('.Ls77Lb')[0]},
    // We define this immediate-child selector for the .aj9 Mail sidebar container,
    // which Gmail replaces with the .aXo Contacts sidebar container.
    // page-parser-tree watches the immediate children of the sidebar, declared by
    // the previous selector, for when the .aj9 container is added and removed.
    // By declaring this element as a direct child of its parent, we discover the new
    // button when the user returns to the primary Mail view.
    '.aj9',
    // Under the .aj9 container, we again use the $map operator to jump to the
    // compose button itself.
    {$map: (e) => $(e).find('div[gh="cm"]:not(.mixmax-compose-button)')[0]}
  ]}
],
finders: {
  originalComposeButton: {
    // page-parser-tree calls this (by default) every five seconds to ensure
    // we haven't missed the compose button due to a change that impacts the above
    // selectors.
    fn: (doc) => $(doc).find(GmailSelectors.COMPOSE_BUTTON_ORIGINAL).toArray()
  }
}

Using this formulation for the watcher, we avoid specifying selectors for each element between the pageContent container and the sidebar container, and between the Mail sidebar container and the compose button itself. As such, Gmail is free to change the exact structure it uses within that DOM subtree.

The new approach is a dramatic shift in how we observe page changes. Instead of watching the entire page for any change, and then rediscovering all elements that match a given selector, we define the structural relationships between key page elements, and have page-parser-tree watch only those elements for relevant changes. The new approach is faster, reasonably reliable, and more responsive to page changes than our old method.

Limitations

Do note that page-parser-tree isn’t a silver bullet. It doesn’t support tagging the same dom node with the same tag from multiple watchers, nor does it support unrestricted subtree fanout/deep selectors. It has first-class support for identifying immediate children, and provides operators to watch for attribute changes, arbitrary filters, element-to-element mapping, and more.

Another important caveat is that watchers aren’t smart enough to watch for attribute changes based solely on the selector. If you ask it to identify elements that are .nH.id, and Gmail changes that element to not have the id class, the element will remain in the tag. To correctly track these changes, we need the $watch operator. We do this when we watch for messages being opened and closed by the user:

watchers: [
  // Filter message containers by whether they are open, updating the tag when the
  // user opens/closes one of the messages.
  {sources: ['message'], tag: 'openMessage', selectors: [
    // page-parser-tree calls cond to determine whether a given element should be in
    // the openMessage tag, and re-evaluates a given element when any of the
    // attributes on that element change. When the attributeFilter array is provided,
    // page-parser-tree only re-evaluates when any of those attributes change.
    {$watch: {attributeFilter: ['class'], cond: (e) => $(e).hasClass('h7')}}
  ]}
]

In another case, we identify zero or more form elements within email message bodies, and disable Gmail’s form submission warning. The old code looked like this:

ElementUtils.onElementAdded('form[action*="mixmax.com"]', (forms) => {
  forms.removeAttr('onsubmit');
  forms.on('submit', (e) => e.stopPropagation());
});

The above approach is super robust to changes within Gmail, but introduces performance issues as in every other case. We thus now use page-parser-tree. It identifies the open message container, but we can’t use it to find the actual form elements. The form elements might be anywhere within the DOM subtree, and page-parser-tree does not support deep selectors as they violate the premise of not watching for global changes. Moreover, because there might be more than one element, we can’t use the $map trick from above to identify all the forms, because $map only maps one element to another element — no “fanout.” As such, we do not expose the form elements from our watchers or finders at all, but instead prefer to discover them on top of the open message container tag, and provide an interface for discovering them in the UI abstraction.

UI uses the knowledge that these forms will only be present within open message containers, and available DOM as soon as the message has been opened, to find them directly with jQuery:

// The actual implementation abstracts this line as subscribe('openMessage', ...)
toValueObservable(Watcher.tree.getAllByTag('openMessage')).subscribe(({value}) => {
  const message = $(value.getValue());
  // Find the forms that have an action that submits data to a mixmax domain.
  message.find('form[action*="mixmax.com"]').each(function() {
    handler($(this));
  });
});

Which is then used to disable the warning:

UI.added('mixmaxForm', (form) => {
  form.removeAttr('onsubmit');
  form.on('submit', (e) => e.stopPropagation());
});

A final limitation is that tags cannot receive the same element multiple times. We ran into this limitation when adding functionality to support preview pane. It means we must be careful when sharing watchers between multiple tags. For example, it means we cannot use the following to detect when an element is empty:

// Do not do this!
watchers: [
  {sources: ['replyContainer'], tag: 'nonEmptyReplyContainer', selectors: [
    // Grab all immediate children of the replyContainer.
    '*',
    // Return to the replyContainer - in theory, page-parser-tree would track which
    // child elements it received the replyContainer element from, and would
    // appropriately remove the replyContainer from nonEmptyReplyContainer when all
    // the child elements are gone. page-parser-tree does not currently support this
    // use-case.
    {$map: (e) => e.parentNode}
  ]}
]

A potential solution to the above would be to replace the '*' with '*:first-child'. We also encountered this issue when attempting to reuse watchers between different preview pane states and modes, and had to wholly restructure our selectors to correctly match the message list view in all view configurations.

Observables

We use Observable objects to simplify some of our interactions with page-parser-tree. A subscription to an Observable immediately receives any elements that already reside in the Observable (and in the tag), and receives subsequent elements as they’re discovered by page-parser-tree. The Observable is provided by zen-observable and instantiable via live-set’s toValueObservable (demonstrated in getFirstFromTag in the Implementation section).

Observable subscriptions also expose handy removal Promise objects, which are resolved when the associated element has been removed from its tag. We use removal Promises in a couple places to correctly manage the lifecycle of our own code.

For example, when Gmail is in preview pane mode (a Gmail lab), it re-renders the recipient cell in the thread row into which we inject our reminder button. To reattach the button when the row changes, our UI abstraction watches thread rows, and subscribes to an observable that observes the thread tag. The observable provides both the added element and a removal Promise. UI registers its own MutationObserver when it detects a new row, and disconnects it when the promise resolves:

const observable = toValueObservable(Watcher.tree.getAllByTag('thread'));
observable.subscribe(({value, removal}) => {
  const row = $(value.getValue());
  // Find the parent of the node of interest: the node of interest will exist now, but may be replaced.
  const recipientContainer = row.find(GmailSelectors.RECIPIENT_WRAPPER).parent()[0];
  const observer = new MutationObserver(_.throttle(() => handler(row), 10));
  // Watch the recipient container node for child list changes, so we discover the new recipient
  // wrapper when it's been replaced.
  observer.observe(recipientContainer, {
    childList: true
  });
  // When the row is removed (or we switch views), disconnect the observer.
  removal.then(() => observer.disconnect());
  // Initially call the handler with the row.
  handler(row);
});

Our old approach had deadly performance issues. Our new approach has many nuances and complexities. This reflects the nature of the problem — it’s not easy, it has real user impact, and a solution must balance many factors.

Have a knack for engineering solutions to software problems that prioritize user experience? We’re hiring!

You deserve a spike in replies, meetings booked, and deals won.

Try Mixmax free

Ready to transform your sales process & engage customers?

December 7, 2017

Precisely observing structural page changes