Updated on July 8, 2020

In this article, I want to go over some fundamental design concepts that are useful for coming up with system design.

Requirements

Functional Requirements
- Describes specific behaviors
- i.e. If a URL is generated, it is composed of a Base64 encoded alias
Non-functional Requirements
- Describes architectural requirements
- i.e. URL redirection should occur with minimal latency
Extended Requirements
- Metrics, REST API, etc.

Capacity Estimation

Capacity can be split up into four subsections:

Traffic
Storage
Bandwidth
Memory

Traffic

Traffic means the incoming reads to your service (ingress) and the outgoing writes to your clients (egress).

For estimates, it's good to come up with a number for the read:write ratio. For example, for a URL shortener, this might be something like 100:1. To find the QPS (queries per second), coming up with a ballpark number of requests per month and using a little bit of math would be a good starting approach.

Storage

Storage is really going to depend on your service/application.

Important to know size tables:

1 byte = 8 bits
1,000 bytes = 1 KB
1,000,000 bytes = 1000 KB = 1 MB
1,000,000,000 bytes = 1,000,000 KB = 1000 MB = 1 GB
1,000,000,000,000 bytes = 1,000,000,000 KB = 1,000,000 MB = 1000 GB = 1 TB
1,000,000 = Million
1,000,000,000 = Billion
1 sec = 1000 milliseconds
60 sec = 1 minute
3600 sec = 60 minutes = 1 hour
24 * 3600 = 1440 minutes = 24 hours = 1 day

Here is a ballpark list for various content:

URLs: \(0.5 KB\)
Images: \(250 KB\) (avg.), \(10 MB\) (max)
Videos: \(2 MB\) (avg.), \(100 MB\) (max)
Articles: \(50 KB\) (avg.)

We should also consider ballpark sizes for database objects as well:

Int: 4 bytes
Float: 4 bytes
Datetime: 4 bytes
String: Usually around 10 - 32 bytes
Double: 8 bytes

Pricing

We could assume that 0.1 GB would cost about 10 cents, although nowadays this might be around 2 cents (AWS in 2020).

Bandwidth

Bandwidth is mostly about doing a little bit of math to get the bytes per second for ingress and egress.
Note that a server's bandwidth is never unlimited. For a high quality server, it can probably have about ~10 Gbps egress bandwidth.

HTTP Requests

The number of HTTP requests per second is also limited. On average, it can be anywhere from 10 to 1000. In 2008, Wikipedia had about 100-200 requests, most of them which is cached.

Memory

The 80-20 rule means that 20% of the content created can generate 80% of the traffic. This means that ideally we want to cache the 20% content here. A good estimate here is to use the bandwidth numbers to come up with the amount of storage needed to cache an entire day's worth.

API

API will vary depending on your application, but some good pointers to think about:

How flexible is your API? Does it provide choices to the user?
How do you prevent abuse of the API? (hint: provide users with a API dev key)

Database Design

It is good to start out with a sample schema for the data you need to store.

Then you can ask yourself if you want to use relational databases or non-relational databases? Does it benefit how read-heavy or write-heavy your application is? Do you have a lot of relationships with the data you need to store?

Relational databases have the power of ACID, which makes it a tried and true solution if you need your data to be consistent and reliable. Concurrency Protocols such as 2PL (Two Phase Locking) can be utilized to ensure atomic distributed transactions. The difficulty here is how would you handle scaling? Also, the schema here is fixed, and SQL joins on many tables can really impact the performance of lookups. If read-speed is important, consider adding indexes to improve read performance. An index is basically a table-of-contents data structure that points us to the location of where the actual data lives. Do note that an index means that extra writes are required to maintain the index, so it can lead to decreased write performance.

Non-relational databases don't have ACID, but they are fast and easy to horizontally scale. Some NoSQL databases are built to make it easy to add additional hosts to scale, such as Cassandra. Remember that NoSQL databases are pretty diverse category. For example, there are graph databases (GraphQL), columnar databases (similar to MySQL and tabled data), document based storage (MongoDB) or key-value stores (DynamoDB).

Key-Value Stores: Data is stored in an array of key-value pairs. The ‘key’ is an attribute name which is linked to a ‘value’. Well-known key-value stores include Redis, Voldemort, and Dynamo.

Document Databases: In these databases, data is stored in documents (instead of rows and columns in a table) and these documents are grouped together in collections. Each document can have an entirely different structure. Document databases include the CouchDB and MongoDB.

Wide-Column Databases: Instead of ‘tables,’ in columnar databases we have column families, which are containers for rows. Unlike relational databases, we don’t need to know all the columns up front and each row doesn’t have to have the same number of columns. Columnar databases are best suited for analyzing large datasets - big names include Cassandra and HBase.

Graph Databases: These databases are used to store data whose relations are best represented in a graph. Data is saved in graph structures with nodes (entities), properties (information about the entities), and lines (connections between the entities). Examples of graph database include Neo4J and InfiniteGraph.

Application Design

This is the core design of your application. This varies hugely, but it is good to come back to the earlier sections to remind yourself what kind of services you actually need to make your application work.

Here are some ideas to consider depending on the nature of your application:

Message Queue

The Publisher/Subscriber Model is a pretty popular design pattern, and message queues such as RabbitMQ are also a very popular implementation of that. With message queues, subscribers can constantly poll for the result of their request. It is a great solution for offline batch processing and making the request/response model more interactive if desired.

Ticketing Systems

Two-phase locking (for locking reservations/sessions)

Collaborative Filtering

Used by recommender systems to make automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).

Read vs. Write Throttling

In general, each server has a maximum amount of concurrent connections it can handle. We have to ask ourselves - how long do these requests take?

If the write requests can take a very long time (for example, a user wants to upload a big file), this can throttle the read requests. In this case, it might be beneficial to have separate servers for read requests and separate servers for write requests.

Database Scaling

Scaling out your database means that you'll have to consider database partitioning.

Partitioning Method

Horizontal Partitioning / Data Sharding
Vertical Partitioning

Partitioning Algorithm

Range Based Partitioning
- i.e. A-E goes to DB1, F-J goes to DB2, etc.
Hash-Based Partitioning
- Hash the key to an index, where each index maps to a specific DB.
- What if you add more DB hosts? Or remove existing DB hosts?
- The validity of the hashes will break, since the increase/decrease in host size will affect the hashing algorithm.
- To counter this, use consistent hashing. When a DB host is removed, the keys in that host will go to another DB host. When a DB host is added, the keys from another DB host will be split into chunks, with some of them going to the new DB host.
Round-Robin Partitioning
- Take turns cycling through DBs. Not a great approach since this leads to unbalanced distribution of keys.

Application Cache

(Note: Let's also be cognizant of database caches)

Ask yourself the following:

What cache application is suitable? i.e. Memcached?
How much memory do you need?
- Use the result from the capacity estimation, and you should also add some allowance of free memory to prevent major issues such as out of memory errors.

Cache Eviction Policy

How is the cache used?

LRU (Least recently used, i.e. the oldest)
FIFO (First in first out)
LIFO (Last in first out)
MRU (Most recently used, i.e. the newest)
LFU (Least frequently used)
- This is like LRU, but it counts how often the item is used.

Cache Invalidation

How is the cache maintained?

Write-through
- Update the cache when the backend gets updated. Write-heavy.
Read-through (Write-around)
- Update the cache when there is a cache miss; read from the backend. Read heavy
Write-back
- Update the cache when data is stored, but do not update the backend until the specified intervals for syncing the cache -> backend.
- Good latency and high write-throughput, but data loss risk is high if the servers crash and the new write data is only in the cache.

Load Balancers

Where can we place load balancers for your application service? Should we use a hardware LB or a software LB? How will it do health checks? Can we use LBs for LBs for redundancy?

In general, hardware LBs are very expensive but very high-performance. They are hard to configure. This is a great option for load balancing user requests to the application servers.

For load balancing the application server requests to databases or other services, a software LB or hybrid LB would be a good option to reduce costs. Nginx or Elastic Load Balancing (ELB) is a good example of a software LB.

High Availability Pairs

A load balancer can also be a single point of failure, so often times it is good to have a secondary load balancer that is ready to take requests when the primary load balancer goes down. The idea of this is known as HA Pairs or High-Availability Pairs, and is effective for improving the fault tolerance of your application.

Automation

Do we need cron tasks to take care of maintenance related tasks? For example, do some DB keys need to be deleted or updated at an interval? When is the best time to do this? What are some important metrics for your service?

Article Tags:

system designunlisted

Design Concepts