I launch two consumers on the same consumer group, i subscribe to 20 topics (each has only one partition)
Only on consumer is used :
kafka-consumer-groups --bootstrap-server XXXXX:9092 --group foo
--describe --members --verbose
Note: This will not show information about old Zookeeper-based consumers.
CONSUMER-ID HOST CLIENT-ID #PARTITIONS ASSIGNMENT
rdkafka-07cbd673-6a16-4d55-9625-7f0925866540 /xxxxx rdkafka 20 arretsBus(0), capteurMeteo(0), capteurPointMesure(0), chantier(0), coworking(0), horodateur(
0), incident(0), livraison(0), meteo(0), metro(0), parkrelais(0), qair(0), rhdata(0), sensUnique(0), trafic(0), tramway(0), tweets(0), voieRapide(0), zone30(0), zoneRencontre(0)
rdkafka-9a543197-6c97-4213-bd59-cb5a48e4ec15 /xxxx rdkafka 0
What i do wrong ?
Ok, I did some reading around such behavior, and it's interesting to know why it happens. There are two kinds of partition assignment strategy in Kafka.
Range:
Assigns to each consumer a consecutive subset of partitions from each topic it subscribes to. So if consumers C1 and C2 are subscribed to two topics, T1 and T2, and each of the topics has three partitions, then C1 will be assigned partitions 0 and 1 from topics T1 and T2, while C2 will be assigned partition 2 from those topics. Because each topic has an uneven number of partitions and the assignment is done for each topic independently, the first consumer ends up with more partitions than the second. This happens whenever Range assignment is used and the number of consumers does not divide the number of partitions in each topic neatly.
RoundRobin:
Takes all the partitions from all subscribed topics and assigns them to consumers sequentially, one by one. If C1 and C2 described previously used RoundRobin assignment, C1 would have partitions 0 and 2 from topic T1 and partition 1 from topic T2. C2 would have partition 1 from topic T1 and partitions 0 and 2 from topic T2. In general, if all consumers are subscribed to the same topics (a very common scenario), RoundRobin assignment will end up with all consumers having the same number of partitions (or at most 1 partition difference).
The default strategy is Range, which explains why you are seeing such partition distribution.
So, I did a small experiment. I created two console consumers each listening to topics test1, test2, test3, test4
and each topic has only one partition. As expected consumer-1 was assigned all partitions.
Then I changed the partitioning strategy to org.apache.kafka.clients.consumer.RoundRobinAssignor
and passed it to both the console-consumers, and voila, both consumers now gets 2 partitions each.
UPDATE:
Oops didn't see it was already answered couple of minutes back.
In Kafka, a topic/partition could only be consumed by at most one consumer in a consumer group, to avoid race contention between consumers.
In Apache Kafka, the partitions number defines the level of parallelism you want in terms of consumers in the same consumer group; it means that two consumer as part of the same consumer group cannot read from the same partition.
In your case you have topic with just one partition which will be assign to only one consumer and the other one will be just idle waiting for a rebalancing: it means that if the first consumer disconnect, the second one will move from idle to consuming the partition.
If your expectation is getting 10 topics for each consumer it's not how Apache Kafka works. As I said the parallelism unit is the partition in the topic and not the topic itself.
Ok i found the probleme, it's work with :
'partition.assignment.strategy': 'roundrobin'
CONSUMER-ID HOST CLIENT-ID #PARTITIONS ASSIGNMENT
rdkafka-fa7ec1ca-1c34-498b-bd22-24ad6ca99645 /XXXX rdkafka 10 capteurPointMesure(0), meteo(0), metro(0), parkrelais(0), qair(0), sensUnique(0), tweets(0),
voieRapide(0), zone30(0), zoneRencontre(0)
rdkafka-89f765b6-2014-4b8c-bef2-c6406763118b /XXXX rdkafka 10 arretsBus(0), capteurMeteo(0), chantier(0), coworking(0), horodateur(0), incident(0), livrai
son(0), rhdata(0), trafic(0), tramway(0)
The range strategy work per topic, with roundrobin i have the expected result.