Traditional Message Queues
Traditional message queues are based off of the JMS / AMQP standard. These message brokers focus on a pub/sub model where publishers write messages to a queue and the queue is consumed by subscribers. When a message is consumed by a subscriber, the subscriber acknowledges the consumption back to the broker, thereby removing the message from the queue. In this sense, traditional message brokers are not like databases since messages are not durably stored. In addition, it has no strict ordering guarantees for the messages in the queue.
Log-based Message Brokers
When we look at something like Apache Kafka, a distributed event stream, it deviates from these traditional message brokers - it is a log-based message broker. Log-based message brokers will, as the name implies, append entries to a log on disk, meaning they are durable. This is what allows Kafka (and other log-based message brokers) to replay events that might have already have been consumed by another client, or recover to a good state from machine failures.
In addition, Kafka scales by distributing chunks of these logs across multiple machines - otherwise known as partitions, where one machine is mapped to one log partition. This makes log-based message brokers a highly scalable and distributed message broker with fault tolerance and replay capabilities.
Comparison
These two different message broker types have some things in common but also some very important key differences.
Message Queue (JMS / AMQP based)
- Examples: RabbitMQ, Amazon SQS
- Messages are not durable, typically
- Messages are inserted and removed into the queue with low latency
- No ordering guarantee!
- Messages are ACK'ed by consumers before they are removed from the queue.
- Can parallelize at the message level.
Bottom line: Suited for processing messages that are bulky and some delay is acceptable, i.e. offloading computationally expensive work asynchronously. You generally don't care about the ordering of the messages as long as it gets done. Also good if you want routing on a message-by-message basis.
Log-based Message Broker (a.k.a. event stream)
- Examples: Apache Kafka, Amazon Kinesis
- Writes append-only logs to disk. Durable. Offers event replay mechanisms and fault tolerance.
- High throughput of data and distributed. The append-only logs are partitioned by any number of machine nodes you throw at it.
- Ordered. (only at the partition level though - messages across multiple partitions for a single topic have no ordering)
Bottom line: Suited for messages that are quick to process, need some ordering guarantee (with caveats above) and have very high data throughput requirements. Event replay mechanisms and fault tolerance are also nice. Can be slower than traditional message queues due to potential disk reads and routing is not as granular as message queues (no event-based routing, but by partition/topic).