@rkenmi - Search Results

Data stores in Software Architectures

... DynamoDB*) or Columnar Databases (Cassandra) are a good choice. Both of these databases can handle read/writes of large amounts of unstructured data with very good latency and availability, at the cost of consistency.

July 19, 2021

Storing passwords into a database

Don'ts Don'ts Don't put raw passwords in the database Don't put encoded passwords in the database (i.e. Base64) Don't put simple hashed passwords in the database (i.e. MD5, SHA-256) Whys For obvious reasons, putting raw passwords means that the DBA or anyone who has access to the database can steal the passwords.

January 30, 2021

Design Concepts

... schema for the data you need to store. Then you can ask yourself if you want to use relational databases or non-relational databases? Does it benefit how read-heavy or write-heavy your application is? Do you have a lot of relationships with the data you need to store? Relational databases have the power of ACID, which makes it a tried and true solution if you need your data to be consistent and reliable.

November 20, 2019

Distributed scaling with Relational Databases

Background A lot of articles will talk about how to scale databases. Typically, they will talk about the purpose and the general idea of sharding and replication, but often times these topics are explained separately and not so much in conjunction.

January 19, 2022

B-Trees vs. LSM Trees

B-Trees Modern databases typically use B-Trees or LSM Trees (Log structured merge trees). B-trees are "tried and true" data structures that are popular in database usage, most notably SQL databases.

March 14, 2018

Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently.

January 15, 2022

NoSQL - the Radical Databases

NoSQL NoSQL is a category of databases that aren't relational. For example, MySQL would be a relational database, where as MongoDB would be a NoSQL database. Back then, relational databases were the tried-and-true, prevalent and reliable data stores.

December 31, 2020

PostgreSQL - a powerhouse relational database

PostgreSQL is one of the most popular SQL database engines and is widely used in the tech industry. It is a great starting point for building out an application that needs fast reads and writes with an emphasis on reads, and guarantees ACID properties through atomic transactions with various concurrency isolation levels.

July 11, 2025

Sharding Techniques

... this article, I cover three sharding techniques to improve the read/write performance of databases. Horizontal Partitioning Partitioning is simple and straightforward to implement, but not so great for long term use.

July 31, 2020

RDBMS Optimization

... beforehand to avoid expensive join calls Federation Another optimization for relational databases is to simply have more than one DBs. Each DB can represent a functionality of your application.

June 23, 2020

Snowflake

Introduction Twitter's Snowflake is a ID generation scheme that tackles all of the requirements below: ID fits under 64 bits ID will be used with distribution in mind (horizontal scale SQL, Cassandra, etc.

January 20, 2021

Atomic operations with Elasticsearch

Preface Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Key Terms: Document - Serialized JSON data.

May 6, 2020

Big Data Processing: Batching vs. Streaming

... may come from daily cronjobs or are exported from copies of OLTP (Online Transaction Processing) databases, such as a SQL database for inventory or customer purchases. The intuition behind the input being files is that large amounts of data simply fit easier into disk than on memory.

January 3, 2022

Data Sharding: Twitter Posts

Scenario Let's begin with a Twitter-like service that allows you to tweet new posts. The service has very high read and write traffic , we'll say ~10k read TPS, or transactions-per-second for starters.

December 28, 2021

Authentications

... can be generated with many algorithms: Auto Increment Comes out of the box in most SQL databases Doesn't scale when you increase number of machines due to ID inconsistencies UUID ID is guaranteed to be unique, probability of creating just one duplicate is extremely low It's a randomly generated ID, so no input required Can scale as you increase number of machines Its 128 bits so it can be too big SHA256 It's a reliable and fast hash It's a hash so you need some input (i.

February 3, 2021

RDBMS Indexing

Introduction As illustrated in this article, indexing is one of the easiest and most effective tweaks you can add to your SQL database. However, indexing might seem like magic, and you might also not be too sure which field to index in the first place.

January 24, 2021

Web Development 101

HTTP vs. HTTPS HTTP stands for Hypertext Transfer Protocol. It typically runs on TCP port 80. It is a protocol for sending data through browsers in the form of webpages and such. One major flaw with HTTP is that it is vulnerable to man in the middle attacks.

November 9, 2017

Apache Kafka and Event Streaming

... removing the message from the queue. In this sense, traditional message brokers are not like databases since messages are not durably stored. In addition, it has no ordering guarantees for the messages in the queue.

March 31, 2021

Big Data Cheat Sheet

... SQL queries. The Hive Metastore is the data catalog for Hive, which helps structure data into databases, tables, partitions. For example, data can be written to a Hive-style path such as s3://test/id=1/name=Fred which will write partitioned data based on the fields id, name.

September 3, 2024

Seattle Conference on Scalability: YouTube Scalability

Notes Apache isn't that great at serving static content for a large number of requests vs. NetScaler load balancing Python is fast enough There are many other bottlenecks such as waiting for calls from DB, cache, etc.

December 25, 2020

Scaling Instragram Infrastructure

Notes Sending notifications to a person whose photo you liked: RabbitMQ -> Celery Django / Python for web server / application PostgreSQL to store users, medias, friendships, etc. Master with multiple replicas, where reads happen on replicas (Master-Slave Replication) To deal with increased latency with writes, by batching requests wherever possible Replication lag from Master to slave replicas was not a big issue (for them) Cassandra NoSQL (wide column store) to store user feeds, activities, etc.

December 26, 2020

Sharding User IDs of Celebrities

Problem When you are partitioning (or sharding) database writes across multiple nodes based on a User ID, a typical partitioning algorithm is to use a basic hash like MD5 to have a reasonably compact (as in, low number of bits) partition ID.

January 15, 2022

Basic Encryption

There are two types of encryptions: symmetric and asymmetric. In a symmetric encryption, one secret key is generated, and that very same key is shared between the sender and receiver. Databases benefit from using symmetric encryptions, as they are much faster to use as opposed to the asymmetric alternative.

October 26, 2018

Useful Links

This is a personal list of useful resources for improving web stacks, frameworks, development, UX, whatever that I come across! Software Engineering The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Encryption vs.

September 4, 2017

CAP Patterns

The CAP Theorem dictates that only two of its three characteristics can be guaranteed at any given time. Intro to CAP Consistency Every read will be based off of the latest write Availability Every request will be given a response, although the response data might be stale Partition Tolerance It can handle network partitions or network failures MTV's The Real World If your service is in the cloud, the P in Partitioning has to always be accounted for.

December 16, 2020