Search Results


15 matches found for 'distributed systems'

A primer on MapReduce

To first understand this very popular backend technology called MapReduce, let's take a look at Map and Reduce. Terminology The terms Map and Reduce are actually very popular higher-order functions used in functional programming.


Snowflake

Introduction Twitter's Snowflake is a ID generation scheme that tackles all of the requirements below: ID fits under 64 bits ID will be used with distribution in mind (horizontal scale SQL, Cassandra, etc.


2PC - Two Phase Commit and Why it Sucks

Background Two Phase Commit (abbreviated 2PC) is a protocol used to achieve atomic writes in distributed systems. It was a novel concept in the 1970's and had good intentions, but in practice the implementations are not too great.


Distributed scaling with Relational Databases

... SQL database and how that becomes much more fuzzier when you transition over to SQL on distributed systems and you want to keep the same ACID guarantees such as atomicity or strong consistency.


CAP Patterns

The CAP Theorem dictates that only two of its three characteristics can be guaranteed at any given time. Intro to CAP Consistency Every read will be based off of the latest write Availability Every request will be given a response, although the response data might be stale Partition Tolerance It can handle network partitions or network failures MTV's The Real World If your service is in the cloud, the P in Partitioning has to always be accounted for.


Comparison Charts of File Storage Formats

Big Data Encodings These encodings are often used with HDFS or some other distributed file system. Since the data can be as large as terabytes or petabytes, it is crucial to encode files in a space optimal way and also allow themselves to be read or written in an optimal way.


NoSQL - the Radical Databases

NoSQL NoSQL is a category of databases that aren't relational. For example, MySQL would be a relational database, where as MongoDB would be a NoSQL database. Back then, relational databases were the tried-and-true, prevalent and reliable data stores.


Data stores in Software Architectures

Use Cases There are many ways to store your data. In this article we'll walk through some examples of data storage in common system designs. Reminder: There is no single best storage choice and they may vary heavily depending on things such as access patterns and scale.


Design Concepts

In this article, I want to go over some fundamental design concepts that are useful for coming up with system design. Requirements Functional Requirements Describes specific behaviors i.e. If a URL is generated, it is composed of a Base64 encoded alias Non-functional Requirements Describes architectural requirements i.


Asynchrony vs. Multithreading

Asynchrony Asynchronous programming, also known as event-driven programming, is built on foundations of Futures/promises. The basic idea is that instead of having a thread wait for a blocked call to finish (i.


Authentications

Authentication Authentication means to verify who you are. Basic Auth Sensitive data required for login is encoded with Base64. Base64 is very easy to decode. Not recommended and probably the least secure authentication method, but easy to implement.


Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently. Secondary indices are used in relational databases (e.


Big Data Processing: Batching vs. Streaming

Intro In data processing, we often have to work with large amounts of data. The way in which this data is gathered comes in a few variants: batching, where we aggregate a collection of data (e.g., by hourly time), streaming for data that needs to be processed in real-time, and a unified variant which simply does not distinguish the technical difference between batching and streaming, allowing you to programmatically use the same API for both.


Webpack: Usage Examples

Webpack has been around since 2012 and it is a very popular tool nowadays. You'll see it mentioned in a lot of front-end stacks. I've personally used it to power this blog and a handful of my own React projects such as https://classic-ah.


Web Development 101

HTTP vs. HTTPS HTTP stands for Hypertext Transfer Protocol. It typically runs on TCP port 80. It is a protocol for sending data through browsers in the form of webpages and such. One major flaw with HTTP is that it is vulnerable to man in the middle attacks.