Processing and Acknowledging messages is a destructive and deletes the data from broker
If Consumer has been shut down or crashes, we need to be careful to delete the queue otherwise, messages will keep accumulating and take away memory from consumers that are still active????
Log based Message Broker
Producer message is appended at the end of the log
Consumer reads sequentially from the log
Consumer waits for new message at the end of the log
Unix tail -f also works the same
Logs can be partitioned
Each partition can be hosted on different nodes
A topic is group of partitions that all carry messages of same type
Within each partition, broker assigns offset
Offset is monotonically increasing sequence number
No order guarantee across different partitions
Broker assigns entire partitions to nodes in the consumer group
Each client consumes all messages from the assigned partition
Partitioning Key
decides which partition message will go to
Example: user_id can be chosen to send events for a particular use in correct order
Consumer Offset
Broker remembers the messages offset consumed in each consumer
If consumer node fails, another node is assigned to the partition and resume consuming
Slow message processing blocks later messages in the partition
It is also possible to change and is under consumer’s control
This can help process older data in certain timeframe
When Producer sends messages faster than consumers
It uses buffering with a large fixed size buffer
When Queue cannot fit in disk space
It overwrites old messages
Log implements a bounded size buffer and is aka Circular Buffer or Ring Buffer
If Consumer falls behind or missing messages, only that consumer is affected
If Consumer shuts down or crashes, it stops consuming resources
Consumer offset is still saved in partitions
Examples
Apache Kafka
Amazon Kinesis Streams
Twitter DistributedLog
In-memory vs Log based
JMS/AQMP style of message broker is preferred when
Message ordering is not important
Each message expensive to process
Parallelize processing
Queue storage
Messages are kept in memory by default.
If the queue grows too large, it spills over to disk
Throughput
High Throughput in the beginning when queue is short
When queue becomes large, throughput becomes much slower when start writing to disk
Processing messages causes message to be deleted
Log based message broker is preferred when
Message ordering is important
High Message Throughput
Each message is fast to process
Throughput is constant
Processing messages is done by reading a log which does not change
Message Brokers vs Databases
Message Brokers are not suitable for long term data storage
Message Brokers automatically delete data
Message Brokers assume queue data is small
Message Brokers don’t support indexing and searching