Search Results


5 matches found for 'MapReduce'

A primer on MapReduce

To first understand this very popular backend technology called MapReduce, let's take a look at Map and Reduce. Terminology The terms Map and Reduce are actually very popular higher-order functions used in functional programming.


Sharding Techniques

... would be good to disable writes and only allow read access. Shoutouts: This is similar to how MapReduce is implemented, where there is a controller entity that manages which machines are in charge of processing certain chunks of a large file.


Big Data Cheat Sheet

... Hadoop is a framework for large-scale, distributed jobs that consists of these main components: MapReduce: jobs are distributed into a group of mapper tasks and then reduced (combined) into a single output HDFS: A distributed file system used by Hadoop, which is shared across the Hadoop cluster.


Big Data Processing: Batching vs. Streaming

... we typically mean that we want to group up some data, run some kind of job/operation on it (i.e. MapReduce) and send the results to some output (i.e. a database or data warehouse). Batch processing is generally done at larger companies where huge amounts of metadata (thousands of petabytes) needs to go through some transformation.


NoSQL - the Radical Databases

... and HBase is not the same thing. Hadoop consists of the Hadoop Distributed File System (HDFS), MapReduce, and a management bridge. HBase is a columnar NoSQL data store. HBase can be used on top of a Hadoop cluster to do performant random read/writes, since HDFS doesn't support random read/write.