@rkenmi - Snowflake

Snowflake


An ID scheme for distributed systems

Snowflake


Back to Top

Updated on January 29, 2022

Introduction

Twitter's Snowflake is a ID generation scheme that tackles all of the requirements below:

  • ID fits under 64 bits
  • ID will be used with distribution in mind (horizontal scale SQL, Cassandra, etc.)
  • Roughly sortable (challenging in a distributed system)

Scheme

The composition is as follows:

\([timestamp] - [worker] - [sequence]\)

  • Timestamp (41 bits): Millisecond precision
  • Worker Number (10 bits): Also known as the machine ID, in a distributed database we may have many, many workers. Each number represents a unique machine. Worker numbers can be chosen at startup with something like Zookeeper
  • Sequence Number (12 bits): This can be thought of as the transaction ID on the worker machine itself. Therefore, this number is based on a per-thread basis. Keep in mind that 12 bits allows for 4096 entries (\(2^{12} = 4096\)), so the sequence number should naturally reset back to 0 once it passes the 4096 mark.

https://github.com/twitter-archive/snowflake


Article Tags:
databasesqltwittersnowflakedistributed systemscassandra