Zookeeper

  • Zookeeper manages brokers (maintains a list of all brokers)
  • Helps in leader election for partitions
  • It sends notification to Kafka in case of changes (eg. new topic, broker dies, broker comes up, delete topics…)
  • Zookeeper works with odd number of servers (of Zookeeper) (3,5,7…)
  • Zookeeper has a leader (handle writes from the brokers) the rest of the servers are followers (handle reads from the brokers)

KRaft

  • In newer versions of Kafka, ZooKeeper is replaced by Kafka Raft (KRaft) for managing metadata and leader elections internally within Kafka
  • We don’t need zookeeper dependency to run Kafka

ISR (In Sync Replicas)

  • Set of brokers that are fully sync’d with the leader
  • Candidate broker to become leader
  • Zookeeper elects the ISR to become leader in case the leader dies
  • ISR guarantees that the data is replicated and available in event of failure
  • More to discover:
    • High Watermark (HW)
    • Log End Offset (LEO)

Broker

  • Server is called Broker
  • Each Broker has ID which is integer
  • Each Broker contains certain topic partitions
  • After you connect to one broker you are connected to entire cluster
  • Good to start with 3 brokers.

Kafka Cluster

  • A Kafka Cluster is composed of multiple brokers

Replication Factor

  • It is the number of replication of partition which needs to be done
  • It is specified at the time of topic creation

Kafka Broker Discovery

  • Every Kafka broker is also called “bootstrap server”
  • You only need to connect to one broker and you will be connected to the entire cluster
  • Each broker knows about all brokers, topics and partitions

Kafka Guarantees

  • Messages are appended to a topic-partition in the order they are sent
  • Consumers read messages in the order stored in a topic-partition
  • With a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down
  • Replication factor of 3 a good idea:
    • Allows for one broker to be taken down for maintenance
    • Allows for another broker to be taken down unexpectedly
  • As long as the number of partitions remains constant for a topic (no new partitions), the same key will always go to the same partition

Back pressure

  • Producer:
    • Batching
    • Buffering and Retry
  • Broker:
    • Segment Rotation
  • Consumer:
    • Flow Control: controls the rate of consuming via poll()

Handle Data consistency when broker goes down

  • When a broker goes down, the leader of each partition is reassigned to another replica from the ISR (In-Sync Replica) set to ensure continued availability.
  • ????

Fault Tolerance and High Availability