検索結果


16 件が該当しました

Virtual Memory

What is Virtual Memory? Virtual Memory is best described as a swap file on your hard disk that holds memory information for your running applications. Memory is structured and managed in two different ways; paging and segmentation.


B-Trees vs. LSM Trees

... usage, most notably SQL databases. With a B-Tree indexing structure, data is written onto the disk in fixed size page segments. These page segments are often about 4 KB in size, and have key value pairs sorted by the key.


PostgreSQL - a powerhouse relational database

... database engine. It is also incredibly vertically scalable - adding more memory, cpu cores, and disk space gives it a significant boost in capabilities. On average commodity hardware, we can expect the ballpark performance to be as follows: Read Performance Index reads can range from 10,000 TPS - 20,000 TPS per CPU core Complex join queries: Around 1,000 - 2,000 TPS Full table scan: This is naturally slow, but is especially egregious if the records cannot fit into memory Write Performance Single-table INSERT INTO: ~5,000 TPS per CPU core Single-table UPDATE (plus index updates): ~1,000 TPS per CPU core Multi-table/index writes from complex transactions: ~100 TPS Bulk operations: ~10,000 TPS Bottlenecks PostgreSQL performance on a single node will degrade if any of the following is true: Size of all data cannot be contained in memory Disk space being exceeded Disk I/O The write performance above is bounded by disk I/O constraints while writing to the WAL Complex joins or updates to multiple tables/indexes are always going to be a bottleneck If records exceed 10 million: Queries and joins start to stagnate Full text-search also starts to degrade ACID Guarantees PostgreSQL is widely known for its ACID properties - Atomicity, Consistency, Isolation, and Durability.


Big Data Processing: Batching vs. Streaming

... relies on bounded data. Traditionally, the input for batch processing are files stored on disk. This is the case for MapReduce implementations such as Hadoop. These files may come from daily cronjobs or are exported from copies of OLTP (Online Transaction Processing) databases, such as a SQL database for inventory or customer purchases.


Apache Kafka and Event Streaming

... message broker. Log-based message brokers will, as the name implies, append entries to a log on disk, meaning they are durable. This is what allows Kafka (and other log-based message brokers) to replay events that might have already have been consumed by another client.


A primer on MapReduce

... not fit into RAM, and a single machine will also not be able to hold the entire file in its hard disk either. Therefore, the entire file has to be split into pieces, scattered throughout multiple machines.


Working with Production at Amazon Retail Website

... can include any kind of metrics for the hosts that are vital to its uptime. For example, low disk space, high CPU usage or high memory usage can indicate a server just waiting to crash and go down, causing your end users to suffer.


OS 101

... kernel, Linux kernel Hypervisor A hypervisor manages hardware resources such as CPU, memory, disk space as an abstraction across multiple operating systems or (virtual) instances.


2PC - Two Phase Commit and Why it Sucks

... Prepare, Commit), it will record this state change onto the transaction log, which is also on disk. This is to provide a recovery mechanism in the case of failures such as system restarts on the Coordinator host.


Web Development 101

... and core clock frequency has capped at around 3ghz for quite some time. You can expand your hard disk space and RAM, but even they have size limits. You can spend $1,000 on a single machine to get the state-of-the-art equipment, but spending $10,000 more on that machine for the best-of-the-best isn't going to immensely boost the machine's performance relative to the first boost.


Seattle Conference on Scalability: YouTube Scalability

... by 20%. Linux sees 5 volumes instead of 1 logical volume, allowing to more aggresively schedule disk I/O. Same hardware. Eventually did Database Partitions to spread writes AND read, partition by user.


Comparison Charts of File Storage Formats

... convenient way for developers and end-users, with less importance on the data size (in memory or disk). For this reason, these formats are typically human readable. For example, CSV and TSV are very popular output formats for data analysts who may use programs like Microsoft Excel.


RDBMS Indexing

... for those new entries as well. This means that writes take longer, and takes up a lot more disk space. Therefore, if you are write-heavy, then you might want to avoid over-indexing, or be wary of which fields you index.


Big Data Cheat Sheet

... written to the HDFS (Hadoop File System) storage layer, which ultimately reside in the storage disk of nodes. YARN would be used to manage the nodes in a Hadoop cluster and schedule MapReduce tasks to the nodes appropriately.


Quick Numbers in Software Engineering Cheatsheet

... \(1,000,000 ns\) \(1,000 μs\) \(1 ms\) Somewhat fast HDD seek (i.e. 7200 RPM disk drives) \(10,000,000 ns\) \(10,000 μs\) \(10 ms\) Read 1 MB seque


Data stores in Software Architectures

... for objects that are rarely fetched. An alternative approach is to store data into actual hard disks (i.e. Amazon EBS), however there are major scaling drawbacks to this, as well as resiliency issues.