Messaging Systems

Streams

  • Streams Examples
    • stdin, stdout
    • FileInputStream in Java
    • TCP connections
    • Video/Audio streams

Event

  • aka Record
    • Self contained
    • immutable
    • contains timestamp
  • Encoded
    • Text
    • JSON
    • Binary
  • Examples
    • User Activity Events
    • Sensors
    • DB Writes

Producer

  • aka Publisher or Sender
  • Event is generated by Publisher

Consumer

  • aka Subscriber or Recipient
  • Event is consumed by Consumers
  • Multiple Consumers
    • Load Balancing
      • Each Message is delivered to one of the consumers
      • The messages may be delivered arbitrarily
      • Good for parallelizing heavy workloads
      • Example
        • shared subscription in JMS
    • Fan out
      • Each message is delivered to all of the consumers
      • Example
        • topic subscription in JMS
  • Acknowledgments
    • Each client acknowledges to the broker that the message has finished processing
    • It is possible that client finished processing but the acknowledgment got lost
      • Broker redelivers the message
      • Use Atomic Commit protocol to avoid issue
  • The combination of Load balancing and Retry inevitably cause out of order processing
    • Use separate queue per consumer instead of Load Balancing

Topic

  • aka Stream
  • Grouping of events

Messaging System criteria

  • What happens if Producer sends messages faster than consumers
    • System can drop messages
    • Buffer messages in a queue
      • What if queue cannot fit in memory
        • Does the system crash?
        • Does it write msgs to disk? Does performance have impact?
    • Apply backpressure (aka flow control)
      • prevent producer from sending more messages
      • Unix pipes and TCP uses backpressure
  • What happens if nodes crash or go offline, does the messages get lost?

Messaging Systems

  • Simple: Unix pipes and TCP
    • connects one publisher to only one consumer
  • Direct Messaging
    • UDP multicast for stock market feeds
    • TCP or IP multicast using brokerless libraries
      • ZeroMQ
      • nanomsg
    • Unreliable UDP for collecting metrics
      • StatsD
      • BruBeck
    • Webhooks
      • Works when consumer exposes an API
      • Consumer callback URL is registered with the producer
      • When event happens they are invoked
    • Limited Fault tolerance
  • Message Brokers

Message Brokers

  • aka Message Queue
  • Runs as a server
  • Durability of messages is moved to broker
  • Async in nature
  • Data may be kept in memory or disk
    • generally allows unbounded queueing
  • Standards
    • Java Message Service (JMS)
    • Advanced Message Queuing Protocol (AMQP)
  • Types
    • In-memory message broker
      • RabbitMQ
      • ActiveMQ
      • Amazon SQS
    • Log based message broker
      • Kafka
      • Amazon Kinesis
  • Implementations
    • RabbitMQ
    • ActiveMQ
    • Azure Service Bus
    • Google Cloud Pub/Sub
  • https://stackoverflow.com/questions/38444425/how-does-rabbitmq-actually-store-the-message-physically

In-memory Message Broker

  • Processing and Acknowledging messages is a destructive and deletes the data from broker
  • If Consumer has been shut down or crashes, we need to be careful to delete the queue otherwise, messages will keep accumulating and take away memory from consumers that are still active????

Log based Message Broker

  • Producer message is appended at the end of the log
  • Consumer reads sequentially from the log
  • Consumer waits for new message at the end of the log
    • Unix tail -f also works the same
  • Logs can be partitioned
    • Each partition can be hosted on different nodes
    • A topic is group of partitions that all carry messages of same type
    • Within each partition, broker assigns offset
      • Offset is monotonically increasing sequence number
    • No order guarantee across different partitions
    • Broker assigns entire partitions to nodes in the consumer group
    • Each client consumes all messages from the assigned partition
  • Partitioning Key
    • decides which partition message will go to
    • Example: user_id can be chosen to send events for a particular use in correct order
  • Consumer Offset
    • Broker remembers the messages offset consumed in each consumer
    • If consumer node fails, another node is assigned to the partition and resume consuming
    • Slow message processing blocks later messages in the partition
    • It is also possible to change and is under consumer’s control
      • This can help process older data in certain timeframe
  • When Producer sends messages faster than consumers
    • It uses buffering with a large fixed size buffer
  • When Queue cannot fit in disk space
    • It overwrites old messages
    • Log implements a bounded size buffer and is aka Circular Buffer or Ring Buffer
  • If Consumer falls behind or missing messages, only that consumer is affected
  • If Consumer shuts down or crashes, it stops consuming resources
    • Consumer offset is still saved in partitions
  • Examples
    • Apache Kafka
    • Amazon Kinesis Streams
    • Twitter DistributedLog

In-memory vs Log based

  • JMS/AQMP style of message broker is preferred when
    • Message ordering is not important
    • Each message expensive to process
    • Parallelize processing
    • Queue storage
      • Messages are kept in memory by default.
      • If the queue grows too large, it spills over to disk
    • Throughput
      • High Throughput in the beginning when queue is short
      • When queue becomes large, throughput becomes much slower when start writing to disk
    • Processing messages causes message to be deleted
  • Log based message broker is preferred when
    • Message ordering is important
    • High Message Throughput
    • Each message is fast to process
    • Throughput is constant
    • Processing messages is done by reading a log which does not change

Message Brokers vs Databases

  • Message Brokers are not suitable for long term data storage
    • Message Brokers automatically delete data
  • Message Brokers assume queue data is small
  • Message Brokers don’t support indexing and searching