Kafka producer resilience config: Fail but never b

2019-05-11 07:22发布

I am currently learning some Kafka best practices from netflix (https://www.slideshare.net/wangxia5/netflix-kafka). It is a very good slide. However, I really dont understand one of the slides (slide 18) mentioned about producer resilience configuration, I hope someone in stackoverflow is very kind to give me insight for that (Cant find the video or reach out the author...).

The slide mentioned: Fail but never block in producer resilience configuration.

Block.on.buffer.full=false 

Even thought this is the deprecated configuration, I guess the idea is to let producer fail right away rather than block to wait. In the latest kafka configuration, I can use a small value for block.max.ms to fail the producer to sends message rather than blocking it.

Question 1: Why we want to fail it right away, does it means retry later on rather than block it ?

Handle Potential Block for first meta data request

Question 2: I can understand the meta data in the consumer side. i.e registering consumer group and sort of stuff, but what is meta data request for producer point of view ? and is it potentially blocked ? Is there any kafka documentation to describe that

Periodically check whether Kafka producer was open successfully 

Question 3: Is there a way we can check that and what benefits for that check ?

Thanks in advance :)

1条回答
Summer. ? 凉城
2楼-- · 2019-05-11 08:07

You have to keep in mind how a kafka producer works:

From the API-Documentation:

The producer consists of a pool of buffer space that holds records that haven't yet been transmitted to the server as well as a background I/O thread that is responsible for turning these records into requests and transmitting them to the cluster.

If you call the send method to send a record to the broker, this message will be added to an internal buffer (the size of this buffer can be configured using the buffer.memory configuration property). Now different things can happen:

  1. Happy path: The messages from the buffer will get converted into requests to the broker by the background I/O thread, the broker will ACK this messages and everything will be fine.
  2. The messages can not be send to the kafka broker (connection to broker is broken, you are producing messages faster than they can send out, etc.). In this case it is up to you to decide what to do. Setting the max.block.ms (as an replacement for block.on.buffer.full) to a positive value the send message will block for this amount of time(1) and through a timeout exception afterwards.

Regarding your questions: (1) If I got the slides right, Netflix explicitly wants to throw away the messages which they can't send to the broker (instead of blocking, retrying, failing ...). This of course highly depends on your application and the kind of messages you are dealing with. If it "just log messages" it might be no big deal. If it comes to financial transactions you may want to

(2) The producer needs some metadata about the cluster. E.g. it needs to know which key goes to which partition. There is a good blogpost by hortonworks how the producer works internaly. I think it is worth reading: https://community.hortonworks.com/articles/72429/how-kafka-producer-work-internally.html

Furthermore the statement:

Handle Potential Block for first meta data request points to an issues which is as far as I know still around. The very first call of send will do a sync. metadata request to the broker and therefor may take longer.

(3) Connections to the producers are closed by the broker if the producer is idle for some time (see connections.max.idle.ms). I am not aware of some standard way to keep the connection of your consumer alive or even to check if the connection is still alive. What you could do is peridicaly send a metadatarequest to the broker (producer.partitionsFor(anyTopic)). But again maybe this is not an issue for your application.


(1) When it comes to details what is taken into account to calculate the time passed it get's a bit tricky. For max.block.ms it is actually:

  • metadata fetch time
  • buffer full block time
  • serialization time (customized serializer)
  • partitioning time (customized partitioner)
查看更多
登录 后发表回答