Search Results

34 matches found for 'indexing'

Local Secondary Index vs. Global Secondary Index

Secondary Index Recall that a secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently.

SQL: Indexing

Introduction As illustrated in this article, indexing is one of the easiest and most effective tweaks you can add to your SQL database. However, indexing might seem like magic, and you might also not be too sure which field to index in the first place.

Atomic operations with Elasticsearch

... of aliases here is that you can insert the documents into a separate index, and when the indexing operation is complete, you can simply map the alias to that new index, and it will near-instantly point to that index.

RDBMS Optimization

Indexing Probably the easiest tweak to implement. It can usually be done with one SQL command. However, an index should be made based on a good column. For example, if you are frequently querying your rows by timestamp, then the timestamp can be chosen for an index.

Queue Reconstruction by Height

Problem Suppose you have a random list of people standing in a queue. Each person is described by a pair of integers (h, k), where h is the height of the person and k is the number of people in front of this person who have a height greater than or equal to h.

Phone Number Mnemonics in Python

Recursion is a great way to come up with permutations of strings or elements. Not necessarily because of the performance (iterative styles are usually faster) but because of simplicity and organization - like many algorithms with functional programming languages like Scala, recursive functions in general look neater, nicer, and easier to categorize.

Random Permutation

Problem Given an array of integers, and an integer index \(k\), randomly permute the array up to index \(k\). Input \(A\) - Array of integers \(k\) - integer index representing the indices to permute up to Approach Key insights into random permutation with arrays: If only k elements are permuted in an array of size n, the algorithm for random permutation should only cost k in time complexity.

Cyclic Permutation

This is an array/list trick that saves about O(n) in space complexity by taking advantage of a given array with integers representing indices. For this example problem, suppose that we want to apply some permutation P to an array A.

Merge k sorted linked lists

Problem Merge \(k\) sorted linked lists and return it as one sorted list. Analyze and describe its complexity. Example: Input: [ 1->4->5, 1->3->4, 2->6 ] Output: 1->1->2->3->4->4->5->6 Input lists - an array of linked list pointers # Definition for singly-linked list.

Reversing sublists of singly linked lists

Singly linked lists can be tricky in its own right. Without the convenience of having a pointer to the previous node that is provided in doubly-linked lists, it can be really easy to be wasteful in performance with singly linked lists.

Design Concepts

In this article, I want to go over some fundamental design concepts that are useful for coming up with system design. Requirements Functional Requirements Describes specific behaviors i.e. If a URL is generated, it is composed of a Base64 encoded alias Non-functional Requirements Describes architectural requirements i.

Search in a rotated sorted array

var jsav = new JSAV("av-sym"); jsav.label("A sorted array rotated to the left by 3"); var arr = jsav.ds.array([25, 28, 29, 34, 1, 15, 20]); Problem Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand.

Find the duplicate number

Problem Given an array nums containing \(n + 1\) integers where each integer is between \(1\) and \(n\) (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Algorithm Handbook

... heap based sorting algorithm which is typically implemented with arrays and indexing. This has potential issues if the dataset is gigantic and the array is allocated statically - priority queues are used as an alternative.

Buy and Sell Two Stocks

Problem Given an array of integers representing stock prices, with each index representing a different day, find the maximum profit by selling at most two stocks. The buy/sell dates must not overlap.

B-Trees vs. LSM Trees

... data structures that are popular in database usage, most notably SQL databases. With a B-Tree indexing structure, data is written onto the disk in fixed size page segments. These page segments are often about 4 KB in size, and have key value pairs sorted by the key.

Python String Tricks - Performance considerations

If a string is changed to become bigger, consider trying to find the maximum size of the final string early on in a quick pass (strive for \(O(n)\)) Consider working from the tail of the string and iterating backwards if, say, 1 character needs to be replaced with 2.

String Manipulation in Python 3+

There are a few ways to search for a string in Python (3+): String operations str.find(sub[, start[, end]])] "shadow walker".find('w') => 5 This gives you the index of the first match. While the index might be nice to have, do remember that strings are immutable.

Bloom Filter

A set data structure uses a hashing function to store values and to verify if a value exists. Bloom filters are similar in that it uses multiple hashing functions to store values and to verify if a value exists.

Data Sharding: Twitter Posts

Scenario Let's begin with a Twitter-like service that allows you to tweet new posts. The service has very high read and write traffic , we'll say ~10k read TPS, or transactions-per-second for starters.

Big Data Processing: Batching vs. Streaming

... data analysts may not pull every column in a query, these databases are also read-optimized by indexing singular columns, rather than rows with every single column. Data Warehouses and SQL databases are similar in that they use the same query interface (SQL).

Replace all occurrences of a space inside an array

Problem Given an array of characters, replace all occurrences of a space with another string. Assume that the array has enough space to contain all of the replacements. Input A - An array of characters which may or may not have spaces word - The string to replace spaces with.

Coin Change Denominations

Problem Given some amount of money integer, and an array of integer coins, calculate the total number of ways to make change. Approach Suppose that we want to find the total number of denominations of 5 cents, using any U.

Rotate a 2D Matrix

var jsav = new JSAV("set"); jsav.label("Before rotation").css({"color": "gray"}); var m = jsav.ds.matrix([[0, 1, 2], [3, 4, 5], [6, 7, 8]]); jsav.label("After rotation").css({"color": "green"}); var n = jsav.

Next Permutation

Problem Implement next permutation, which rearranges numbers into the lexicographically next greater permutation of numbers. If such arrangement is not possible, it must rearrange it as the lowest possible order (ie, sorted in ascending order).

Data stores in Software Architectures

... the few challenges we would have with scaling. We could scale the SQL using techniques such as indexing, federation, data denormalization (to have less join queries), and moving some data out to another data source (i.

Build a Trie in Python

Problem In computer science, a trie (pronounced "try"), also called digital tree, radix tree or prefix tree, is a kind of search treeā€”an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings.

Build a Binary Tree with Pre-order and In-order traversal data

Problem Given a pre-order traversal array of integers and an in-order traversal array of integers, construct a binary tree. Input preorder - array of integers inorder - array of integers # Definition for a binary tree node.

Compute the max. water trapped by a pair of vertical lines

Problem An array of integers naturally defines a set of lines parallel to the Y-axis, starting from x = 0 as illustrated in the figure above. The goal of this problem is to find the pair of lines that together with the X-axis "trap" the most water.

Scaling Instragram Infrastructure

Notes Sending notifications to a person whose photo you liked: RabbitMQ -> Celery Django / Python for web server / application PostgreSQL to store users, medias, friendships, etc. Master with multiple replicas, where reads happen on replicas (Master-Slave Replication) To deal with increased latency with writes, by batching requests wherever possible Replication lag from Master to slave replicas was not a big issue (for them) Cassandra NoSQL (wide column store) to store user feeds, activities, etc.

Kefir.js - Reactive Javascript

Background Kefir.js is a Reactive Programming library for JavaScript inspired by Bacon.js and RxJS, with focus on high performance and low memory usage. Kefir works with objects called observables. observables could be two things; a stream, or a property (not to be confused with a Javascript object property) Streams A stream is a sequence of events made available over time.

Summation with rationals

Given some lower bound a = na/da and upper bound nb/db, write a function that calculates the summation of f(nk/dk), where k is the index, n represents numerator, and d represents denominator. An example: if a =1/2 and b = 4/5, then you should calculate f(1/2)+f(2/3)+f(2/4)+ f(3/4)+ f(3/5)+f(4/5) def sum (f: Rational => Rational)(a: Rational, b: Rational) : Rational = { val k = a val total = new Rational(0, 1) def innerSum (f: Rational => Rational) ( k: Rational, total : Rational) : Rational = { if (a <= k && k <= b && k.

Decode number of ways

Problem A message containing letters from A-Z can be encoded into numbers using the following mapping: 'A' -> "1" 'B' -> "2" ... 'Z' -> "26" To decode an encoded message, all the digits must be grouped then mapped back into letters using the reverse of the mapping above (there may be multiple ways).

Distributed scaling with Relational Databases

... optimizations we talked about. For general optimizations: Apply techniques such as secondary indexing, federation, data denormalization. Cache as much as you can - it's not just there for fast lookup times, but also to reduce the number of read requests going to your database For higher read throughput: Use RDBMS replications (single-leader or multi-leader) Most RDBMS replications are done asynchronously, which means reads will be eventually consistent Use single-leader for replications within the same data center Use multi-leader for replications across different data centers For higher read throughput with strong consistency reads: Use single-leader replication with consensus algorithms to vote on the total ordering of operations i.