Search Results


17 matches found for 'big data'

Big Data Processing: Batching vs. Streaming

... (hours or days), and 2. we need a reasonable framework to execute the transformation logic at big data scale. Input Batch processing relies on bounded data. Traditionally, the input for batch processing are files stored on disk.


Data stores in Software Architectures

... the messages from the message queue to return real time data back to the user Since we have big data with respect to time, we can use something like a Time Series Database (TSDB) such as InfluxDB, rather than Data Warehouse or Cassandra.


NoSQL - the Radical Databases

... traffic grows, relational databases have had significant struggles to scale up to the level of big data. NoSQL databases are generally equipped with features that allow it to easily scale, at the cost of having less reliable or consistent data.


Distributed scaling with Relational Databases

... alternatives to SQL (i.e. NoSQL) for situations where we need high write throughputs, or to store big data (petabytes and beyond) or have incredibly low latency (tens of milliseconds) on reads/writes.


Comparison Charts of File Storage Formats

Big Data Encodings These encodings are often used with HDFS or some other distributed file system. Since the data can be as large as terabytes or petabytes, it is crucial to encode files in a space optimal way and also allow themselves to be read or written in an optimal way.


Design Concepts

In this article, I want to go over some fundamental design concepts that are useful for coming up with system design. Requirements Functional Requirements Describes specific behaviors i.e. If a URL is generated, it is composed of a Base64 encoded alias Non-functional Requirements Describes architectural requirements i.


Scaling Instragram Infrastructure

Notes Sending notifications to a person whose photo you liked: RabbitMQ -> Celery Django / Python for web server / application PostgreSQL to store users, medias, friendships, etc. Master with multiple replicas, where reads happen on replicas (Master-Slave Replication) To deal with increased latency with writes, by batching requests wherever possible Replication lag from Master to slave replicas was not a big issue (for them) Cassandra NoSQL (wide column store) to store user feeds, activities, etc.


Tenets of Leetcode

Disclosure I'm a simple guy who did a handful of interviews at tech startups and enterprise companies until I landed an offer at FAANG. I'm also fairly involved with interview loops at my FAANG company.


Web Development 101

HTTP vs. HTTPS HTTP stands for Hypertext Transfer Protocol. It typically runs on TCP port 80. It is a protocol for sending data through browsers in the form of webpages and such. One major flaw with HTTP is that it is vulnerable to man in the middle attacks.


B-Trees vs. LSM Trees

B-Trees Modern databases are typically represented as B-Trees or LSM Trees (Log structured merge trees). B-trees are "tried and true" data structures that are popular in database usage, most notably SQL databases.


Authentications

Authentication Authentication means to verify who you are. Basic Auth Sensitive data required for login is encoded with Base64. Base64 is very easy to decode. Not recommended and probably the least secure authentication method, but easy to implement.


Misconceptions of Software Engineer interviews at FAANG

There are already so many articles and Quora/Reddit/Blind posts out there on how to interview prep to get into big companies. So instead of covering that in detail, I want to focus more on the angle of common misconceptions.


Atomic operations with Elasticsearch

Preface Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Key Terms: Document - Serialized JSON data.


Seattle Conference on Scalability: YouTube Scalability

Notes Apache isn't that great at serving static content for a large number of requests vs. NetScaler load balancing Python is fast enough There are many other bottlenecks such as waiting for calls from DB, cache, etc.


RDBMS Optimization

Indexing Probably the easiest tweak to implement. It can usually be done with one SQL command. However, an index should be made based on a good column. For example, if you are frequently querying your rows by timestamp, then the timestamp can be chosen for an index.


Javascript Essentials

Hoisting Hoisting is JavaScript's default behavior of moving declarations to the top. Given the following Javascript code, what is the expected output, and why? fcn2(); fcn1(); function fcn2(){ alert("hi"); } var fcn1 = function(){ alert("hey") } The expected output is a pop up alert that says "hi", followed by an error that fcn1 isn't defined.


CSR vs. SSR

Introduction 10 years ago, Amazon found that every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. A broker could lose $4 million in revenues per millisecond if their electronic trading platform is 5 milliseconds behind the competition.