HTTP vs. HTTPS
HTTP stands for Hypertext Transfer Protocol. It typically runs on TCP port 80. It is a protocol for sending data through browsers in the form of webpages and such. One major flaw with HTTP is that it is vulnerable to man in the middle attacks.
HTTPS stands for Hypertext Transfer Protocol Secure. It typically runs on TCP port 443. It is essentially the same as HTTP, except the connection between the client and the server is encrypted with a Secure Socket Layer (SSL). The two hosts agree to a unique cryptographic key that acts as a two-way handshake. Then, all data that is sent and received from both sides are encrypted and decrypted with the unique key. This prevents man-in-the-middle attacks from occurring; snoopers can potentially see the data between the two hosts, but the data is garbled up and undecipherable. The snooper can't decrypt the garbled data, since the snooper does not have the unique key that only the two hosts share.
MVC
MVC stands for Model View Controller.
It is an architectural pattern.
- Model: Represents data related logic. It could be business related data (i.e. Customer), for example. Typically, the Model represents a database (i.e. NoSQL or RDBMS).
- View: Represents the UI logic of the application. For example, Pug is a template engine for Node.js that handles the View. React is a Javascript library for handling the View as well. Django Templates are also responsible for altering the View. There are countless more examples.
- Controller: Acts as the interface between the Model and the View. It is mainly responsible for logical operations that gets the View to render something, or to store some new data into the Model (database).
React
React is a library for MVC Views. In other words, it's main purpose is to build user interfaces. With React, the view is consisted of entities called Components. The unique thing about these components is that they have a state life-cycle, allowing for some creative and dynamic views. This means that if your components change over time (for example, a ticking timer text box) , React allows you to handle those changes in a convenient, easy way.
React stands out as a MVC View library in a couple of ways:
- It makes use of Virtual DOM. This is an in-memory cache that computes and renders based on a diffing algorithm. Since the DOM is essentially a tree, and most DOM elements (i.e. HTML
div
,span
, etc.) usually stay the way they are, the Virtual DOM saves performance by keeping the unchanged DOM elements (as many as possible) while splicing the changed DOM elements and replacing them with new ones. - It uses JSX templates. This is a React-specific grammar extension to Javascript. It is basically like the offspring of HTML and Javascript. It is very convenient, although the syntax can often appear a bit sloppy like spaghetti.
Extras:
To use React to its greatest potential, Properties (or props), ideally a set of immutable values, are passed to a component's render function. A component should not directly modify any properties passed to it, but should be passed callback functions that instead modifies the store creating a single source of truth.**
The above quote (taken from the Wikipedia page of React) is a good rule to follow for building performance-tuned user interfaces. For more information on the store, lookup Flux or Redux, which are state management systems that work very well in conjunction with React.
API
A general rule around API protocols is that these should be treated similarly to tools in a toolbox. Each has its pros and cons and the choice to use one over the other depends on what functionality you want in your APIs.
RPC
Remote Procedural Calls are simply put, function calls over a remote server. The idea of RPC is that to the client, RPC calls can be made in the same way that you call a locally written function.
One issue that is present with network communication is that the client and server may be written with different coding languages or network libraries. This is why RPC has client stubs and server stubs, so that the client input can be understood by the server, and the server's response can be understood by the client.
As APIs, these are the first of its kind to appear in the software engineering world.
Pros:
- Using RPC frameworks like gRPC can give you higher network performance than SOAP/REST/GraphQL by leveraging Protobuf's encoding/decoding features to reduce the message payload size, which results in very efficient data payloads
- Protobuf in particular can also give you schemas to work with in your APIs, such as a schema for interfacing message payloads and allowing cross-language interoperability
- It can be suitable for APIs that represents commands (for example, Join a Channel, Leave a Channel).
Cons:
- RPC by itself has no notion of schema (like SOAP) or developer conventions (like REST). It is a wild-west territory where anything goes.
- Endpoints can be named anything so the logic for the function call is up to the developer.
- You can't predict what happens to the final state after calling APIs when something like a network partition occurs
- Tight coupling to the underlying system.
- A lot more dev work to maintain (for the reasons mentioned above).
SOAP
SOAP was created as an alternative to RPC that allows for verbose XML specifications to dictate what messages an API should send. SOAP uses Envelopes to cover the entire message. The message will have a body that contains the request data or response data, headers for specific rules needed, and error types that can be declared.
To call APIs, there is a universal WSDL (Web Service Description Language).
Pros:
- Language and platform agnostic, similar to REST
- Security extensions (WS-Security) allows you to encrypt messages.
Cons:
- Too verbose.
- Messages are large in size due to the verbosity of XML specs.
- XML only. Tedious to update.
REST
REST is an architectural style and it stands for Representational State Transfer. REST-compliant Web services allow requesting systems to access and manipulate textual representations of Web resources using a uniform and predefined set of stateless operations.
What does stateless operations mean? Stateless operations means that whenever you make a request to some server, it does not store your session state or history. Each request is treated as new. All REST operations are idempotent. This is done for performance and simplicity.
Additionally, in REST, the file types represents the semantics of data. This is why when you receive a response from a REST API, the data itself is raw and devoid of meta information such as what the response file type should be. This allows for simple data retrieval, which scales very well as the number of concurrent operations grow in magnitude.
The REST API should be Hypertext driven.
Pros:
- Language and platform agnostic, similar to SOAP
- A benefit over SOAP is the lightweight architectural style of REST. No more heavy XML schemas to maintain.
- Resource based.
- Simple to understand. Only 4 HTTP Verbs (GET, POST, PUT, DELETE) and the URI resource path is needed to understand what the general behavior will be.
- Encourages statelessness, which is always a good thing (less bugs)
- Can also be more securitized using authentication tokens (OAuth)
Cons:
- REST APIs are resource based, which can be limiting for more complex lookups. It effectively doesn't lend itself well to search like operations, often needing more than one network call (or REST API call) to do the job.
- Versioning is tricky with REST, especially with APIs that may be deprecated but are still used by clients
- For majority of APIs, REST as a protocol has no problems. For performance-heavy APIs where minimal latency is a necessity, other protocols such as gRPC are a better choice.
GraphQL
GraphQL was a more recent invention by Facebook that addresses the big concern that REST API had, where multiple redundant REST API calls were needed for more complex query operations.
GraphQL allows you to pass in a schema as an argument to specify how you want your data. For example, you can customize the schema to filter your data a certain way, or maybe do a query lookup similar to SQL style queries.
Pros:
- A transaction involving one or more operations can be batched into one GraphQL call. Reduces the amount of network calls needed for complex query operations in traditional REST.
- Great developer-to-developer experience between front-end teams and API backend teams.
- Use of lightweight JSON SDL (schema definition language) as a declarative query language.
- No need to deal with potential versioning issues of REST.
Cons:
- Learning curve; not as simple as REST APIs for implementation
- It can't be used to replace REST entirely, especially for push style API operations. GraphQL shines for lookup operations instead.
HTTP Verbs
- GET: lookup this resource identified by the given URI. (fetch)
- POST: take this data and apply it to the resource identified by the given URI, following the rules you documented for the resource media type. (create)
- PUT: replace whatever is identified by the given URI with this data, ignoring whatever is in there already, if anything. (update)
- PATCH: if the resource identified by the given URI still has the same state it had the last time I looked, apply this diff to it.
- DELETE: Remove a given resource.
Dependency Injection
Dependency injection is a technique that allows you to have one object supply dependencies into another object. This is a technique particularly with OOP.
There are three main ways to do dependency injection with pros and cons for each.
Constructor injection
Pass in the dependency when you initialize an object.
from engine import HondaEngine
class Car:
def __init__(self, engine):
self.engine = engine
engine = HondaEngine()
honda = Car(engine)
This is simple and it is great if you want to make an object read-only and thread-safe after its construction. The downside here is that if you choose to update or replace the dependencies, then you'll have to add some kind of functionality to do so.
Setter injection
Use a setter method to set the dependency in the object.
from engine import HondaEngine
from car import MyHonda
class Car:
def set_engine(self, engine):
self.engine = engine
engine = HondaEngine()
MyHonda.set_engine(engine)
This is another simple way to set dependencies and a very straightforward one at that. It is flexible since it allows you to update dependencies whenever you want. The downside however, is that when you have multiple dependencies, it's difficult to tell if all dependencies are satisfied or not because they can occur independently. Also, a dependency can remain as null if set_engine
is never called, so you have to add some way to validate that all your dependencies are not null.
Interface injection
This is a slightly trickier approach where each dependency becomes an injector.
What is scaling in the context of web servers?
Web servers are scaled so that they can handle more load as traffic increases by large amounts.
There are two basic types of scaling; horizontal scaling and vertical scaling.
Scalability
Scaling is all about making sure that our applications run with good performance as the number of users on the applications increase.
Vertical scaling
Vertical scaling is basically making your server machine a lot more powerful. Imagine replacing a computer with a small case with a large-sized ATX tower case full of RAM and a beefed up CPU. But as Moore's Law hints, there is a very predictable ceiling to this improvement. CPU cores nowadays are already multi-core, and core clock frequency has capped at around 3ghz for quite some time. You can expand your hard disk space and RAM, but even they have size limits. You can spend $1,000 on a single machine to get the state-of-the-art equipment, but spending $10,000 more on that machine for the best-of-the-best isn't going to immensely boost the machine's performance relative to the first boost.
Horizontal scaling
Horizontal scaling is the answer to the physical limits of vertical scaling, as well as a complimentary decision for the ongoing trend of faster computers and cheaper prices. Horizontal scaling is all about stacking a row of computers horizontally, and combining the power of all the computers in the row to deliver your server's needs.
There are many ways that allow you to distribute the work load for millions of user requests onto rows of machines.
(DNS Round Robin](#dnsrr)
One basic way is to use a DNS server that takes a user's request and returns the IP address for each of the machines in the row, sequentially. For example, if user Rob comes in with a request, the DNS server can take the request and pass it to machine 1. Then user Kate sends a request; the DNS server can now pass it along to machine 2. This is called DNS Round Robin, and it is a straightforward concept but somewhat naive. For example, although each machine gets a chance to handle the request load, some unfortunate machine might have some really bad luck and handle heavyweight load (70%~) while other machines are sitting idly at about ~15% load. The workload isn't quite balanced in this scenario.
Load Balancers
Another method is to have a black box that has its own IP address, which distributes requests to all the machines in the row in a balanced fashion. Popular web hosting services like AWS or Digital Ocean provides Load Balancers, which can be very useful. Load balancers come in many flavors and forms, with their own heuristics on how the load should be balanced.
Extras:
If you are worried about distributed computing and handling things like user session states or cookies, consider storing that type of data (essentially user metadata) in a cache that is shared across all machines, or a separate file server that is in-sync with all of the machines.
CSRF
Cross Site Request Forgery is an attack that manipulates the session state of a user on a web application, often in a bad way, by taking advantage of the user's cookies or other credentials.
How it works
Let's imagine the following scenario:
- User logs in to a website (http://localhost:8080) by clicking on the Login Button
- Browser sends a POST request using HTML form action
- Server receives the POST request and generates a "Successful Login" response. The response will have a cookie in it that contains a token for the authentication
- Browser sees the "Successful Login" page. The cookie is also stored locally for this domain (http://localhost:8080)
- On the same browser, the user goes into a malicious website and clicks on some funny links, while http://localhost:8080 is open on another tab or window.
- The link sends a POST request to the same website (http://localhost:8080) with malicious content inside.
- The POST request goes through, and it also sends the cookies for that domain (http://localhost:8080), since it was open on another tab or window.
- The malicious request now does some bad things, and it can do that because the server reads the cookies and says it is authenticated (from step 4).
- This is BAD!
Takeaway #1: CSRF means that it forges your request by taking advantage of your cookies to send malicious requests from different domains (cross-site).
How to fix
Implementation-wise, a lot of serverside frameworks have built-in support for CSRF, because it's such a common attack. In some cases, you might need to write some code to enable CSRF, but you won't need to write a whole lot with the use of CSRF plugins and libraries.
How does the fix work?
The fix works by taking advantage of the fact that malicious attackers cannot see the DOM. The server puts in a hidden CSRF token in the DOM when returning the response and cookie from a successful authentication. Now, all HTTP requests made to the domain must also contain this CSRF token. The malicious attackers can't see the DOM, so it won't be able to get past the CSRF check. Hence, their requests will be blocked.
CORS
Cross Origin Resource Sharing (CORS) is a mechanism that allows HTTP network requests from a different domain to fetch or update resources on the server. This standard allows two things:
- The server can whitelist a list of IPs so that any request from these IPs can access the resources. The server can also forbid or grant access to certain HTTP methods.
Access-Control-Allow-Origin: http://www.example.com
Access-Control-Allow-Methods: PUT, DELETE
- The client (browser) can specify headers to tell the server how these files will be accessed.
Access-Control-Request-Method
Access-Control-Request-Headers
JWT
JSON Web Token (JWT) is a standard for creating access tokens, which is a token that represents the validity of the authentication between the user and the server. The access token contains the security credentials of the user and identifies the user's privileges, groups, and other characteristics.
A JWT is represented as one large string composed of base64Url encoded values. The header, the payload, and the signature are all base64Url encoded separately, and concatenated all into one string.
A JWT is typically generated by the server, and signed by the server. The header specifies what the hashing algorithm is to sign the token.
The payload indicates information such as which user is logged in and the iat
value or the "issue at time".
Lastly, the signature is the hashed value of the header and payload above, combined with a secret key (i.e., the user's password).
SSR
See https://rkenmi.com/posts/csr-vs-ssr?lang=en
Polling
Coming soon
WebSockets
Coming soon
HTTP/2
Coming soon
See: https://stackoverflow.com/questions/36517829/what-does-multiplexing-mean-in-http-2 in the meantime.