Search Results


12 matches found for 'storage'

Data stores in Software Architectures

... are many ways to store your data. In this article we'll walk through some examples of data storage in common system designs. Reminder: There is no single best storage choice and they may vary heavily depending on things such as access patterns and scale.


Comparison Charts of File Storage Formats

Big Data Encodings These encodings are often used with HDFS or some other distributed file system. Since the data can be as large as terabytes or petabytes, it is crucial to encode files in a space optimal way and also allow themselves to be read or written in an optimal way.


A primer on MapReduce

... across multiple machines. For example, if you have to upload a 5 PB (petabyte) file into a storage service, but none of the servers have enough storage space for that, then what do you do? One tactic is to split the gigantic file into small chunks, and have multiple machines hold those chunks, rather than just having one single computer hold everything.


Design Concepts

... content here. A good estimate here is to use the bandwidth numbers to come up with the amount of storage needed to cache an entire day's worth. API API will vary depending on your application, but some good pointers to think about: How flexible is your API? Does it provide choices to the user? How do you prevent abuse of the API? (hint: provide users with a API dev key) Database Design It is good to start out with a sample schema for the data you need to store.


Distributed scaling with Relational Databases

... request comes in, the closest read replica (by region) can be chosen, for faster reads. Most storage engines come with at least two replication modes: single-leader replication and multi-leader replication.


Big Data Processing: Batching vs. Streaming

... them in memory. Note: Keep in mind that Spark does not come with its own distributed data storage. This means that Spark can be used with Hadoop, which can still use HDFS for the input and output of workflows.


NFT from a Software Developer's perspective

... be untrustworthy and centralized http links, or a link to darknet services / P2P distributed storages (i.e. IPFS). What is it not? There are some misconceptions about NFTs.


Cyclic Permutation

... first, the key observation here is to realize that without an additional buffer or temporary storage, it's difficult to determine which values to swap. Consider the following scenario: A = ['a', 'b', 'c', 'd'] P = [1, 3, 2, 0] Suppose that we iterate through A or P, and i is at index 0.


Virtual Memory

... address space, assigned to a logical partition, that the operating system perceives as its main storage. Therefore, if you have multiple applications running side-by-side, one application's logical memory view would not show memory usage by other applications.


Sharding Techniques

... are different variations out there, like Google Jump Hash, depending on your data storage needs.


NumPy vs. Pandas, and other flavors (Dask, Modin, Ray)

... Linear Algebra. The n-dimensional data structures are built for fast and optimal data access and storage. The downside however, is that NumPy code can be often hard to read, debug, and re-use.


B-Trees vs. LSM Trees

... and non-relational databases such as Bitcask, MongoDB and SQLite4. A simple log structured storage works by having an in-memory hash table (or hash index) that keeps track of keys and values.