There is an application (not mine) that reads messages from Kafka, does some processing on them, and stores records in a database. I've put together a program in Java that writes messages into the queue at a given rate. Right now, it does a simple measure of performance by querying the database at the end of the test run to ensure that records in = records out. However, I'd like to expand it to periodically check the queue to see how many messages are pending that the application hasn't yet processed to see if it's getting backed up.
I figure that I can check offset of the application's group ID in Zookeeper. I looked at the Kafka documentation, but it only gives basic consumer examples and the API documentation is sparse at best, so I'm not sure how to go about finding this information.
What APIs to I need to call in order to find out where in the queue the application is currently at, and how many messages are in the queue behind that position?
I'm using Kafka 2.10-0.8.2.1 with a single Zookeeper instance and three Kafka instances, and the load tester is using the 0.8.2.1 Java API. The topic in question has three partitions (one on each Kafka server), however for the purpose of the test there is only a single consumer.
I would suggest looking at the already provided tools in Kafka (code is available in src if you need to call the API directly). In particular,
$ bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --group consumer-group1 --zkconnect zkhost:zkport --topic topic1
Will show you the offset and lag:
consumer-group1,topic1,0-0 (Group,Topic,BrokerId-PartitionId)
Owner = consumer-group1-consumer1
Consumer offset = 70121994703
= 70,121,994,703 (65.31G)
Log size = 70122018287
= 70,122,018,287 (65.31G)
Consumer lag = 23584
= 23,584 (0.00G)
References:
There are several offsets Kafka exposes (via JMX), that you can use to figure out how much lag consumers have (per topic, per partition). These are Latest Offset (basically where Kafka Broker is with its writes of new data) and Consumer Offset (where Consumers are with their reads). The delta between these two is known as Consumer Lag, and this tells you how far behind real-time Consumers are. Based on this info one can also derive Broker Write Rate and Consumer Read Rate, which are handy to know and see in your Kafka monitoring tool, too. See Kafka Consumer Lag Monitoring for more details. If you just want a tool that can chart a bunch of Kafka metrics, see SPM for Kafka monitoring. HTH.