Zookeeper manages brokers (maintains a list of all brokers)
Helps in leader election for partitions
It sends notification to Kafka in case of changes (eg. new topic, broker dies, broker comes up, delete topics…)
Zookeeper works with odd number of servers (of Zookeeper) (3,5,7…)
Zookeeper has a leader (handle writes from the brokers) the rest of the servers are followers (handle reads from the brokers)
KRaft
In newer versions of Kafka, ZooKeeper is replaced by Kafka Raft (KRaft) for managing metadata and leader elections internally within Kafka
We don’t need zookeeper dependency to run Kafka
ISR (In Sync Replicas)
Set of brokers that are fully sync’d with the leader
Candidate broker to become leader
Zookeeper elects the ISR to become leader in case the leader dies
ISR guarantees that the data is replicated and available in event of failure
More to discover:
High Watermark (HW)
Log End Offset (LEO)
Broker
Server is called Broker
Each Broker has ID which is integer
Each Broker contains certain topic partitions
After you connect to one broker you are connected to entire cluster
Good to start with 3 brokers.
Kafka Cluster
A Kafka Cluster is composed of multiple brokers
Replication Factor
It is the number of replication of partition which needs to be done
It is specified at the time of topic creation
Kafka Broker Discovery
Every Kafka broker is also called “bootstrap server”
You only need to connect to one broker and you will be connected to the entire cluster
Each broker knows about all brokers, topics and partitions
Kafka Guarantees
Messages are appended to a topic-partition in the order they are sent
Consumers read messages in the order stored in a topic-partition
With a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down
Replication factor of 3 a good idea:
Allows for one broker to be taken down for maintenance
Allows for another broker to be taken down unexpectedly
As long as the number of partitions remains constant for a topic (no new partitions), the same key will always go to the same partition
Back pressure
Producer:
Batching
Buffering and Retry
Broker:
Segment Rotation
Consumer:
Flow Control: controls the rate of consuming via poll()
Handle Data consistency when broker goes down
When a broker goes down, the leader of each partition is reassigned to another replica from the ISR (In-Sync Replica) set to ensure continued availability.