Search Results

23 matches found for 'database'

Data stores in Software Architectures

... maintain the relationship between that user and the user's connections. In this case, a graph database like Neo4j is suitable, since we care about relationship properties at a high scale.

Storing passwords into a database

Don'ts Don'ts Don't put raw passwords in the database Don't put encoded passwords in the database (i.e. Base64) Don't put simple hashed passwords in the database (i.

Design Concepts

... (avg.), \(100 MB\) (max) Articles: \(50 KB\) (avg.) We should also consider ballpark sizes for database objects as well: Int: 4 bytes Float: 4 bytes Datetime: 4 bytes String: Usually around 10 - 32 bytes Double: 8 bytes Pricing We could assume that 0.

Distributed scaling with Relational Databases

Background A lot of articles will talk about how to scale databases. Typically, they will talk about the purpose and the general idea of sharding and replication, but often times these topics are explained separately and not so much in conjunction.

Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently.

NoSQL - the Radical Databases

NoSQL NoSQL is a category of databases that aren't relational. For example, MySQL would be a relational database, where as MongoDB would be a NoSQL database.

B-Trees vs. LSM Trees

B-Trees Modern databases are typically represented as B-Trees or LSM Trees (Log structured merge trees). B-trees are "tried and true" data structures that are popular in database usage, most notably SQL databases.

Sharding Techniques

Introduction Sharding can be summarized as a technique in which a database table can be split into multiple database servers to optimize read/write performance. Benefits include: Optimized query time Instead of having one huge database table, you have multiple smaller tables in more than one machine.

RDBMS Optimization

... and improving query performance. It does this by trying to improve the read performance of a database, at the expense of losing some write performance, adding redundant copies of data or by grouping data.

Atomic operations with Elasticsearch

... Terms: Document - Serialized JSON data. A mapping of fields. Index - Roughly equivalent to a database table. An index just contains one or more documents. Bulk API is not atomic When we re-index Elasticsearch documents, documents are updated in real-time.


... Millisecond precision Worker Number (10 bits): Also known as the machine ID, in a distributed database we may have many, many workers. Each number represents a unique machine. Worker numbers can be chosen at startup with something like Zookeeper Sequence Number (12 bits): This can be thought of as the transaction ID on the worker machine itself.

Big Data Processing: Batching vs. Streaming

... run some kind of job/operation on it (i.e. MapReduce) and send the results to some output (i.e. a database or data warehouse). Batch processing is generally done at larger companies where huge amounts of metadata (thousands of petabytes) needs to go through some transformation.

Data Sharding: Twitter Posts

... traffic , we'll say ~10k read TPS, or transactions-per-second for starters. We use a relational database for writes since they are stable, atomic, and resilient. Write performance on a single master database server will not be sufficient at this scale, so we need to incorporate a multi-master setup.


... stateless authentication, you can (and should) store session data into a distributed cache database (like Redis) so that any one of your application servers can verify the session.

RDBMS Indexing

... in this article, indexing is one of the easiest and most effective tweaks you can add to your SQL database. However, indexing might seem like magic, and you might also not be too sure which field to index in the first place.

Web Development 101

... It could be business related data (i.e. Customer), for example. Typically, the Model represents a database (i.e. NoSQL or RDBMS). View: Represents the UI logic of the application. For example, Pug is a template engine for Node.

Seattle Conference on Scalability: YouTube Scalability

Notes Apache isn't that great at serving static content for a large number of requests vs. NetScaler load balancing Python is fast enough There are many other bottlenecks such as waiting for calls from DB, cache, etc.

Apache Kafka and Event Streaming

... removing the message from the queue. In this sense, traditional message brokers are not like databases since messages are not durably stored. In addition, it has no ordering guarantees for the messages in the queue.

Scaling Instragram Infrastructure

... daemon on the Postgres read replicas does cache invalidation in their local pods Problem #2: More database reads, especially for really simple stuff like number of user likes, which used to rely only on cache Solution #2: Indexed table, increased read speed by orders of magnitude.

CAP Patterns

... writes, \(3\) could be for reads. With Quorum Consensus, a versioning number is added for every database table row. This versioning number can be verified across a cluster of Read hosts, giving priority to whichever row has the highest version number among the Read hosts.

Sharding User IDs of Celebrities

Problem When you are partitioning (or sharding) database writes across multiple nodes based on a User ID, a typical partitioning algorithm is to use a basic hash like MD5 to have a reasonably compact (as in, low number of bits) partition ID.

Basic Encryption

There are two types of encryptions: symmetric and asymmetric. In a symmetric encryption, one secret key is generated, and that very same key is shared between the sender and receiver. Databases benefit from using symmetric encryptions, as they are much faster to use as opposed to the asymmetric alternative.

Traditional Message Queues vs. Log-based Message Brokers

... removing the message from the queue. In this sense, traditional message brokers are not like databases since messages are not durably stored. In addition, it has no strict ordering guarantees for the messages in the queue.