Given the following setup:
- Kafka v0.11.0.0
- 3 brokers
- 2 topics, each with 2 partitions, replication factor of 3
- 2 consumer groups, one for each topic
- 3 servers that contain consumers
The servers contain two consumers, one for each topic such that:
- Server A
- consumer-A1 in group topic-1-group consuming topic-1
- consumer-A2 in group topic-2-group consuming topic-2
- Server B
- consumer-B1 in group topic-1-group consuming topic-1
- consumer-B2 in group topic-2-group consuming topic-2
- Server C
- consumer-C1 in group topic-1-group consuming topic-1
- consumer-C2 in group topic-2-group consuming topic-2
In this scenario, when we examine the output of kafka-consumer-groups.bat for group topic-1-group, we see the following:
- consumer-B1 is assigned to topic-1 partition-1
- consumer-C1 is assigned to topic-1 partition-0
- consumer-A1 is assigned to no partition
This appears to be as we would expect. Since the partition count is 2, we only have two active consumers. The third consumer is just idle. We are able to consume messages from the topic just fine.
Next, we shutdown Server B (who is actively assigned to a partition). Doing so, we would expect topic-1-group to enter rebalancing and expect that consumer-A1 would take the place of consumer-B1 and be assigned to a partition such that the following is true:
- consumer-A1 is assigned to topic-1 partition-1
- consumer-C1 is assigned to topic-1 partition-0
- consumer-B1 is assigned to nothing since it is no longer active
What we are seeing happen, though, is the consumer group topic-1-group enters a state of rebalancing that doesn't seem to stop. Heartbeats also seem to fail since the group is in rebalancing.
The only way to recover from this is to shutdown another server so that there is only one consumer for topic-1-group. When there is only one consumer, we are able to successfully receive messages for the topic. Next, if we start up the other two servers, we continue to receive messages successfully for the topic.
Questions
- Is this a valid usage scenario?
- What is expected in this sort of scenario?
- Could there be an issue with the consumers? (In terms of configuration, we are using the defaults for everything with the exception of setting the basics like topic, consumer group, etc... We are using KafkaConsumer.subscribe(Collection) and not manually assigning partitions)
- Could there be an issue with the brokers/Zookeeper?