I have 3 brokers, 3 partitions. Each broker is a leader for one partition and the ISRs for all.
Let us say that I have run the brokers on the ports 19092,29092,39092
respectively.
19092 - partition 0
29092 - partition 1
39092 - partition 2
The Half-broker test:
I would like to name it this way! Because it allows only OUTPUT but not INPUT
Now, I have add the following iptables rule:
iptables -A INPUT -p tcp --dport 29092 -j DROP
and in the Producer:
bin/kafka-console-producer --broker-list 10.54.8.172:19092 --topic ftest
The above iptables rule blocks INPUT access but doesn't restrict the broker from updating its aliveness with the Zookeeper. So zookeeper will not take it to be dead and so will not conduct leader election for partition 1.
But, the producer is not able to connect to it because of the RULE and hence throws error.
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for ftest-1: 1778 ms has passed since batch creation plus linger time
This, I have done manually, but there can be other reasons why INPUT access may be blocked (some malware, DDoS or anything else).
Before iptables RULE:
Metadata for ftest (from broker 1: 10.54.8.172:19092/1):
3 brokers:
broker 2 at 10.54.8.172:29092
broker 1 at 10.54.8.172:19092
broker 3 at 10.54.8.172:39092
1 topics:
topic "ftest" with 3 partitions:
partition 2, leader 3, replicas: 3,1,2, isrs: 3,1,2
partition 1, leader 2, replicas: 2,3,1, isrs: 2,3,1
partition 0, leader 1, replicas: 1,2,3, isrs: 1,2,3
After iptables RULE:
Metadata for ftest (from broker 1: 10.54.8.172:19092/1):
3 brokers:
broker 2 at 10.54.8.172:29092
broker 1 at 10.54.8.172:19092
broker 3 at 10.54.8.172:39092
1 topics:
topic "ftest" with 3 partitions:
partition 2, leader 3, replicas: 3,1,2, isrs: 3,1,2
partition 1, leader 2, replicas: 2,3,1, isrs: 2
partition 0, leader 1, replicas: 1,2,3, isrs: 1,2,3
Since, there is only one leader and it is dead (in the sense it cannot receive any messages), is not a single point of failure?
I think, there must ideally be 2 way communication between Zookeeper and Kafka brokers. Isn't it? Does Kafka allow it? If so, how?
Also, when the 29092 is blocked for INPUT access its ISR shrinked to 1.
It could be because it is not able to receive any messages (heartbeats) from the other 2 brokers.
If it can connect (OUTPUT is enabled), then it can write to them and for the replication to get acknowledged, it needs INPUT access.
So both INPUT and OUTPUT should be there here also.
The broker 29092 is as good as nothing here. Leaving the system in an unrecoverable state!