This is the second post of twelve for Mixmax Advent 2020.
As you may have picked up from previous blog posts, MongoDB is our primary data store and source of truth at Mixmax.
At its peak, our largest cluster had 8 shards and stored 6.9TB of uncompressed document data, and over 2.7TB of indexes. As you can imagine, running this cluster wasn’t cheap. To make matters worse, DB writes took over 200ms on average.
We noticed that all of our largest indexes were compound indexes that started with a userIds
field, followed by an organizationId
field. As the name suggests, userIds
was an array of user IDs used to denote the set of users who had access to a document, and organizationId
was the organization that the document belonged to. An implicit relationship was that all users referenced in the userIds
field were members of the organization referenced by organizationId.
As Mixmax has grown, so have our customers. What were once 10-20 user organizations were now 100-200. This meant that for a given index, a single document could be responsible for hundreds of index keys. Coupled with the fact that organizations with more users tend to have proportionately more documents overall, it was obvious that this architecture couldn’t scale at a reasonable cost.
Looking closer at the data, we saw that in more than 99% of all documents, the userIds
array contained more than 90% of the organization’s users. We realized then that we weren’t getting much value out of the leading userIds
field; it was marginally more selective than the organizationId
field at best, and equally selective at worst.
The solution
Our path forward was clear, if a bit tedious: we’d have to replace all 16 of our indexes that lead with the userIds field. To keep our total index size as small as possible throughout the process, we went through the indexes in ascending order of size, taking the following steps:
- Create a new index containing all but the leading userIds field of the existing one.
- Ensure the new index was in memory by running a few queries explicitly instructed to use it, and confirming by checking wired tiger index stats.
- Drop the old index.
We knew that the leading userIds
field was responsible for a lot of our total index size, but weren’t sure how much. We were floored by the results: removing the userIds
field resulted in indexes anywhere from 80 to 95 percent smaller (the percentage decrease increased linearly with the size of the original index.)
Takeaways
The most selective index isn’t always the most optimal
If a field in a compound index isn’t used for sorting, and doesn’t significantly reduce the number of documents scanned in production queries, it might be taking up more RAM than it’s worth. Especially if it’s an array field.
Chores can be impactful and fun
Serially waiting for multi-day index builds to finish wasn’t all that exciting, but seeing folks’ reactions to the resulting changes in index size and costs sure was.
It’s worth periodically revisiting your index design
As your product evolves, so will your database usage. It’s great if the design decisions you made years ago are still the right call, and it’s worth finding out. What you don’t know could be costing you thousands of dollars a month.