I have been using the python-kaka module to consume from a kafka broker. I want to consume from the same topic with 'x' number of partitions in parallel. The documentation has this :
# Use multiple consumers in parallel w/ 0.9 kafka brokers
# typically you would run each on a different server / process / CPU
consumer1 = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers='my.server.com')
consumer2 = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers='my.server.com')
Does this mean I can create a separate consumer for each process that I spawn? Also, will there be an overlap on the messages being consumed by consumer1 and consumer2 ?
Thanks
Yes, you can create multiple consumers in multiple threads/processes (and even run them in parallel on different machines). As long as all consumers use the same groupID, there will be no overlap. Kafka assigned each topic partition to a single consumer within a consumer group. Be aware, that using more consumers than available topic partitions will result in idle consumers.