Search Results

23 matches found for 'databases'

Data stores in Software Architectures

... DynamoDB*) or Columnar Databases (Cassandra) are a good choice. Both of these databases can handle read/writes of large amounts of unstructured data with very good latency and availability, at the cost of consistency.

Storing passwords into a database

Don'ts Don'ts Don't put raw passwords in the database Don't put encoded passwords in the database (i.e. Base64) Don't put simple hashed passwords in the database (i.e. MD5, SHA-256) Whys For obvious reasons, putting raw passwords means that the DBA or anyone who has access to the database can steal the passwords.

Design Concepts

... schema for the data you need to store. Then you can ask yourself if you want to use relational databases or non-relational databases? Does it benefit how read-heavy or write-heavy your application is? Do you have a lot of relationships with the data you need to store? Relational databases have the power of ACID, which makes it a tried and true solution if you need your data to be consistent and reliable.

Distributed scaling with Relational Databases

Background A lot of articles will talk about how to scale databases. Typically, they will talk about the purpose and the general idea of sharding and replication, but often times these topics are explained separately and not so much in conjunction.

Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently.

NoSQL - the Radical Databases

NoSQL NoSQL is a category of databases that aren't relational. For example, MySQL would be a relational database, where as MongoDB would be a NoSQL database. Back then, relational databases were the tried-and-true, prevalent and reliable data stores.

B-Trees vs. LSM Trees

B-Trees Modern databases are typically represented as B-Trees or LSM Trees (Log structured merge trees). B-trees are "tried and true" data structures that are popular in database usage, most notably SQL databases.

Sharding Techniques

... this article, I cover three sharding techniques to improve the read/write performance of databases. Horizontal Partitioning Partitioning is simple and straightforward to implement, but not so great for long term use.

RDBMS Optimization

... beforehand to avoid expensive join calls Federation Another optimization for relational databases is to simply have more than one DBs. Each DB can represent a functionality of your application.

Atomic operations with Elasticsearch

Preface Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Key Terms: Document - Serialized JSON data.


Introduction Twitter's Snowflake is a ID generation scheme that tackles all of the requirements below: ID fits under 64 bits ID will be used with distribution in mind (horizontal scale SQL, Cassandra, etc.

Big Data Processing: Batching vs. Streaming

... may come from daily cronjobs or are exported from copies of OLTP (Online Transaction Processing) databases, such as a SQL database for inventory or customer purchases. The intuition behind the input being files is that large amounts of data simply fit easier into disk than on memory.

Data Sharding: Twitter Posts

Scenario Let's begin with a Twitter-like service that allows you to tweet new posts. The service has very high read and write traffic , we'll say ~10k read TPS, or transactions-per-second for starters.


... can be generated with many algorithms: Auto Increment Comes out of the box in most SQL databases Doesn't scale when you increase number of machines due to ID inconsistencies UUID ID is guaranteed to be unique, probability of creating just one duplicate is extremely low It's a randomly generated ID, so no input required Can scale as you increase number of machines Its 128 bits so it can be too big SHA256 It's a reliable and fast hash It's a hash so you need some input (i.

RDBMS Indexing

Introduction As illustrated in this article, indexing is one of the easiest and most effective tweaks you can add to your SQL database. However, indexing might seem like magic, and you might also not be too sure which field to index in the first place.

Web Development 101

HTTP vs. HTTPS HTTP stands for Hypertext Transfer Protocol. It typically runs on TCP port 80. It is a protocol for sending data through browsers in the form of webpages and such. One major flaw with HTTP is that it is vulnerable to man in the middle attacks.

Apache Kafka and Event Streaming

... removing the message from the queue. In this sense, traditional message brokers are not like databases since messages are not durably stored. In addition, it has no ordering guarantees for the messages in the queue.

Seattle Conference on Scalability: YouTube Scalability

Notes Apache isn't that great at serving static content for a large number of requests vs. NetScaler load balancing Python is fast enough There are many other bottlenecks such as waiting for calls from DB, cache, etc.

Scaling Instragram Infrastructure

Notes Sending notifications to a person whose photo you liked: RabbitMQ -> Celery Django / Python for web server / application PostgreSQL to store users, medias, friendships, etc. Master with multiple replicas, where reads happen on replicas (Master-Slave Replication) To deal with increased latency with writes, by batching requests wherever possible Replication lag from Master to slave replicas was not a big issue (for them) Cassandra NoSQL (wide column store) to store user feeds, activities, etc.

CAP Patterns

The CAP Theorem dictates that only two of its three characteristics can be guaranteed at any given time. Intro to CAP Consistency Every read will be based off of the latest write Availability Every request will be given a response, although the response data might be stale Partition Tolerance It can handle network partitions or network failures MTV's The Real World If your service is in the cloud, the P in Partitioning has to always be accounted for.

Sharding User IDs of Celebrities

Problem When you are partitioning (or sharding) database writes across multiple nodes based on a User ID, a typical partitioning algorithm is to use a basic hash like MD5 to have a reasonably compact (as in, low number of bits) partition ID.

Basic Encryption

There are two types of encryptions: symmetric and asymmetric. In a symmetric encryption, one secret key is generated, and that very same key is shared between the sender and receiver. Databases benefit from using symmetric encryptions, as they are much faster to use as opposed to the asymmetric alternative.

Traditional Message Queues vs. Log-based Message Brokers

... removing the message from the queue. In this sense, traditional message brokers are not like databases since messages are not durably stored. In addition, it has no strict ordering guarantees for the messages in the queue.