@rkenmi - Search Results

Data stores in Software Architectures

... are many ways to store your data. In this article we'll walk through some examples of data storage in common system designs. Reminder: There is no single best storage choice and they may vary heavily depending on things such as access patterns and scale.

July 19, 2021

Comparison Charts of File Storage Formats

Big Data Encodings These encodings are often used with HDFS or some other distributed file system. Since the data can be as large as terabytes or petabytes, it is crucial to encode files in a space optimal way and also allow themselves to be read or written in an optimal way.

January 24, 2022

A primer on MapReduce

... across multiple machines. For example, if you have to upload a 5 PB (petabyte) file into a storage service, but none of the servers have enough storage space for that, then what do you do? One tactic is to split the gigantic file into small chunks, and have multiple machines hold those chunks, rather than just having one single computer hold everything.

July 11, 2020

Design Concepts

... content here. A good estimate here is to use the bandwidth numbers to come up with the amount of storage needed to cache an entire day's worth. API API will vary depending on your application, but some good pointers to think about: How flexible is your API? Does it provide choices to the user? How do you prevent abuse of the API? (hint: provide users with a API dev key) Database Design It is good to start out with a sample schema for the data you need to store.

November 20, 2019

Distributed scaling with Relational Databases

... request comes in, the closest read replica (by region) can be chosen, for faster reads. Most storage engines come with at least two replication modes: single-leader replication and multi-leader replication.

January 19, 2022

Big Data Cheat Sheet

... features: Write SQL queries for ETL and analytical data Access files from HDFS or other data storage systems like HBase A mechanism to impose structure on a variety of table formats Hive introduced one of the earliest concepts of an open table format.

September 3, 2024

Big Data Processing: Batching vs. Streaming

... them in memory. Note: Keep in mind that Spark does not come with its own distributed data storage. This means that Spark can be used with Hadoop, which can still use HDFS for the input and output of workflows.

January 3, 2022

NFT from a Software Developer's perspective

... be untrustworthy and centralized http links, or a link to darknet services / P2P distributed storages (i.e. IPFS). What is it not? There are some misconceptions about NFTs.

December 26, 2021

Cyclic Permutation

... first, the key observation here is to realize that without an additional buffer or temporary storage, it's difficult to determine which values to swap. Consider the following scenario: A = ['a', 'b', 'c', 'd'] P = [1, 3, 2, 0] Suppose that we iterate through A or P, and i is at index 0.

September 11, 2017

Virtual Memory

... address space, assigned to a logical partition, that the operating system perceives as its main storage. Therefore, if you have multiple applications running side-by-side, one application's logical memory view would not show memory usage by other applications.

September 27, 2020

Search Results