Conditions in which Kafka Consumer (Group) trigger

2019-04-09 23:44发布

问题:

I was going through the Consumer Config for Kafka.

  • https://kafka.apache.org/documentation/#newconsumerconfigs

what are the parameters that will trigger a rebalance ?. For instance the following parameter will ?. Any other parameters which we need to change or will defaults suffice

connections.max.idle.ms Close idle connections after the number of milliseconds specified by this config. long 540000 medium

Also we have three different topics

  1. Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics.
  2. Assuming if the above scenario is valid (not necessarily the best practice) - if one of the topic is very light traffic, will it cause the Consumer group to rebalance.

    A follow up question - what factors affect the rebalancing and its performance.

回答1:

These condition will trigger a group rebalancing:

Number of partitions change for any of the subscribed list of topics

Topic is created or deleted

An existing member of the consumer group dies

A new member is added to an existing consumer group via the join API

Is it a bad idea to have the same Consumer Group (Same ID) consuming from multiple topics.

At least it is valid, as for good or bad, it depends on your detailed case. This is supported by the official java client api, see this method definition:

public void subscribe(Collection<String> topics,
             ConsumerRebalanceListener listener)

It accepts a collection of topics.

if one of the topic is very light traffic, will it cause the Consumer group to rebalance.

No, because this is not listed in conditions. If we just consider it from the topic aspect. only when the topic is deleted or partition counts changed, the rebalcance will happens,.

Update.

Thanks for @Hans Jespersen's comment about session and hearbeat.

this is quoted by kafka Consumer javadoc:

After subscribing to a set of topics, the consumer will automatically join the group when poll(long) is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the poll API sends periodic heartbeats to the server; when you stop calling poll (perhaps because an exception was thrown), then no heartbeats will be sent. If a period of the configured session timeout elapses before the server has received a heartbeat, then the consumer will be kicked out of the group and its partitions will be reassigned.

And In your question, you ask what are the parameters that will trigger a rebalance

In this case, there are two configs has relation with the rebalance. It is session.timeout.ms and max.poll.records. Its means is obvious.

And from this, We also could learn that it is a bad practice to do a lot work between the poll, overhead work maybe block the heartbeat and cause session timeout.