@rkenmi - Home

Misconceptions of ASCII and Unicode

Myth: ASCII characters take up one byte ASCII represented characters using numbers between 32 and 127. This accounted for characters used in the English language (lowercase and uppercase) and the numbers between 1-32 were reserved control characters.

utf-8utf-16ansistringsstringComputer Scienceunicodeascii

NumPy vs. Pandas, and other flavors (Dask, Modin, Ray)

NumPy NumPy is a Python library for numerical computing that offers multi-dimensional arrays and indices as data structures and additional high-level math utilities. ndarray The unique offering of NumPy is the ndarray data structure, which stands for n-dimensional array.

PythonMLnumpydaskpandasmodinray

Comparison Charts of File Storage Formats

Big Data Encodings These encodings are often used with HDFS or some other distributed file system. Since the data can be as large as terabytes or petabytes, it is crucial to encode files in a space optimal way and also allow themselves to be read or written in an optimal way.

encodingscsvtsvparquetavroORC

Traditional Message Queues vs. Log-based Message Brokers

Traditional Message Queues Traditional message queues are based off of the JMS / AMQP standard. These message brokers focus on a pub/sub model where publishers write messages to a queue and the queue is consumed by subscribers.

message queueevent streamingmessage broker

Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently. Secondary indices are used in relational databases (e.

databaseshorizontal partitioningshardingglobal secondary indexlocal secondary index