Search Results


11 matches found for 'apache'

Big Data Processing: Batching vs. Streaming

Intro In data processing, we often have to work with large amounts of data. The way in which this data is gathered comes in a few variants: batching, where we aggregate a collection of data (e.g., by hourly time), streaming for data that needs to be processed in real-time, and a unified variant which simply does not distinguish the technical difference between batching and streaming, allowing you to programmatically use the same API for both.


A primer on MapReduce

To first understand this very popular backend technology called MapReduce, let's take a look at Map and Reduce. Terminology The terms Map and Reduce are actually very popular higher-order functions used in functional programming.


Seattle Conference on Scalability: YouTube Scalability

Notes Apache isn't that great at serving static content for a large number of requests vs. NetScaler load balancing Python is fast enough There are many other bottlenecks such as waiting for calls from DB, cache, etc.


Apache Kafka and Event Streaming

Introduction Apache Kafka is an open-source distributed event streaming platform. Traditional message brokers are based off of the JMS / AMQP standard. These message brokers focus on a pub/sub model where publishers write messages to a queue and the queue is consumed by subscribers.


Data stores in Software Architectures

Use Cases There are many ways to store your data. In this article we'll walk through some examples of data storage in common system designs. Reminder: There is no single best storage choice and they may vary heavily depending on things such as access patterns and scale.


Distributed scaling with Relational Databases

Background A lot of articles will talk about how to scale databases. Typically, they will talk about the purpose and the general idea of sharding and replication, but often times these topics are explained separately and not so much in conjunction.


Useful Links

This is a personal list of useful resources for improving web stacks, frameworks, development, UX, whatever that I come across! Software Engineering The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Encryption vs.


Traditional Message Queues vs. Log-based Message Brokers

Traditional Message Queues Traditional message queues are based off of the JMS / AMQP standard. These message brokers focus on a pub/sub model where publishers write messages to a queue and the queue is consumed by subscribers.


CAP Patterns

The CAP Theorem dictates that only two of its three characteristics can be guaranteed at any given time. Intro to CAP Consistency Every read will be based off of the latest write Availability Every request will be given a response, although the response data might be stale Partition Tolerance It can handle network partitions or network failures MTV's The Real World If your service is in the cloud, the P in Partitioning has to always be accounted for.


AWS and MLOps

Machine Learning Development Lifecycle The lifecycle of the machine learning development process often follows these steps: 1. Data Collection In this step, we fetch data from various sources. Common examples include a data lake, a data catalog, or streaming data (like Kafka, Kinesis).


Comparison Charts of File Storage Formats

Big Data Encodings These encodings are often used with HDFS or some other distributed file system. Since the data can be as large as terabytes or petabytes, it is crucial to encode files in a space optimal way and also allow themselves to be read or written in an optimal way.