Sharding User IDs of Celebrities

Problem

When you are partitioning (or sharding) database writes across multiple nodes based on a User ID, a typical partitioning algorithm is to use a basic hash like MD5 to have a reasonably compact (as in, low number of bits) partition ID. For the majority of users, this doesn't pose a problem. A user's writes or reads will always go to the node that is associated to the user's partition ID.

When some users are celebrities, they are extremely popular, introducing a hotspot where an incredible amount of writes/reads will be made on the partition ID for these users. As a consequence, a few nodes that have celebrity data may have incredibly high resource utilization compared to other nodes.

Solution 1 - Append a suffix to the hash

If you know which keys are hotspots, then one workaround is to append a counter ID to the celebrity user IDs. For example, if the ID of Jason Momoa is JMomoa76, then you can add a 1 digit number to the ID starting at 0. For each incoming write request, you can append the ID by 1, and reset it back to 0 after exhausting all possible digits.

On the first write request, you will then have the ID as JMomoa760 which will be hashed to a partition ID \(sj38bn\). Another write request comes in, and you will hash JMomoa761 which will be hashed to a partition ID \(f73h1b\). This allows you to evenly distribute write requests reducing the likelihood of hotspots.

One notable downside of this approach is that when you need to do a read for JMomoa76's data, it is now unclear where you would fetch the data. Since the data for JMomoa76 can now be stored in the partitions for JMomoa760, JMomona761, JMomoa762, and so on, we would need to do a read query on all partitions and merge the data. As a consequence, there will be added latency and total roundtrip times when a user wants to fetch JMomoa76's posts. Another thing to keep in mind is that this incrementing counter is a very simple example, and the counter can be larger if you'd like (say, 2 or 3 digits => 100 or 1000 partitions). This counter, as well as the known list of hotspot keys, must be tracked or bookmarked somewhere in the application logic.