I have a topic with 3 partitions on 2 brokers. (Kafka version: 0.8.1)
Messages are bulk added using different user guid (like this: FC42B34DD7658503E040970A2C437358) as partition key. (about 10k messages)
While loading the messages, I have one running consumer (consumer1), it starts handling the messages fine.
Then I started another consumer (consumer2) with same consumer group id.
What I noticed is that, consumer1 stops handling the messages, and consumer2 starts handling all the messages.
When I stop consumer2, then consumer1 took over and resumes message processing.
I was expecting both consumers should distribute the load.
Any clue where could be the problem? Thanks.
I am not sure what happens in your consumers exactly without some inspections on your Zookeeper cluster, but I can suspect one possible scenario that your producers may not evenly distribute messages to the partitions.
A partition is owned by a single consumer in a single consumer group. The owner consumer is called a
partition owner
, and all messages come in to a partition is exclusively consumed by its partition owner. (For more information, refer to Consumers in Kafka 0.8.1 documentation.Let us say there are three partitions A, B, C and two consumers 1 and 2, and the producers only send messages to the partition B.
When there is only consumer 1, all messages of the partition B are consumed by the consumer 1.
When you introduce consumer 2, now the partition B is assigned to the consumer 2 by a consumer rebalancing algorithm. Since your producers send messages only to the partition B, now the consumer 2 becomes the only consumer that consumes messages.
After you stop the consumer 2, the partition B is assigned to the consumer 1 again, and all messages are consumed by the consumer 1.
The above scenario is what I can think of. Check whether your producer implementation has a distribution problem.