RDBMS Optimization

Indexing

Probably the easiest tweak to implement. It can usually be done with one SQL command. However, an index should be made based on a good column. For example, if you are frequently querying your rows by timestamp, then the timestamp can be chosen for an index. Indexing can create night-and-day differences in performance for reads.

Horizontal Partitioning (Database)

Note: This should not be confused with Partitioning in Sharding.

Split your table into smaller tables, all on the same DB schema and server. Smaller index/table size = better query performance. B-Tree, LSM. This works very well when your DB becomes too big. You can do partitioning based on hashes or under custom if/else conditions.

Replication

Allows you to add more DB servers that are all consistent. They can act as a backup if one server goes down, to provide high availability. Or, you can have dedicated servers for read only (read replicas), reducing the load on the Master servers and increasing performance. You can use a reverse proxy or load balancer to decide which DB to read from. This works very well for offloading high traffic for one DB server.

Patterns

Master-Slave and Master-Master replications are the most common replication patterns. See CAP Theorem for more details on these.

Denormalization

This technique is the process of avoiding expensive joins and improving query performance. It does this by trying to improve the read performance of a database, at the expense of losing some write performance, adding redundant copies of data or by grouping data.

Example: Pre-join tables beforehand to avoid expensive join calls

Federation

Another optimization for relational databases is to simply have more than one DBs. Each DB can represent a functionality of your application. For example, you can have a Users DB, Forums DB, and Product DB, where each DB is represented by a separate cluster of machines.

The downside of this approach is that it is tricky to do cross-joins across separate DB servers since you will need to do server links. Federation is also not effective for schemas that require huge functions or tables.

Horizontal Sharding

Due to the complexity of implementing this in practice (as well as needing to make some ACID trade-offs), this isn't a recommended optimization unless all other options have been exhausted.

  • Like Horizontal Partitioning, but across different servers and multiple instances of the same schema.
  • It can scale, but it's difficult if not impossible to maintain ACID properties. What happens if one sub-transaction to Server A succeeds but another sub-transaction to Server B fails?
    • How can you rollback?
  • You can use consistent hashing so that a key would always map to the same server.