I am using apache kafka to produce and consume a file 5GB in size. I want to know if there is a way where the message from the topic is automatically removed after it is consumed. Do I have any way to keep track of consumed messages? I don't want to delete it manually.
问题:
回答1:
In Kafka, the responsibility of what has been consumed is the responsibility of the consumer and this is also one of the main reasons why Kafka has such great horizontal scalability.
Using the high level consumer API will automatically do this for you by committing consumed offsets in Zookeeper (or a more recent configuration option is using by a special Kafka topic to keep track of consumed messages).
The simple consumer API make you deal with how and where to keep track of consumed messages yourself.
Purging of messages in Kafka is done automatically by either specifying a retention time for a topic or by defining a disk quota for it so for your case of one 5GB file, this file will be deleted after the retention period you define has passed, regardless of if it has been consumed or not.
回答2:
As per my Knowledge you can Delete the consumed data form the logs by reducing the Storage time. Default time for the log is set for 168 hours and then the Data is automatically removed from the Kafka-Topic which you created. So, my suggestion is to reduce the go to the server.properties
which is located in the config folder and the change the 168 to a minimum time. so their is no data after the specific amount of time which you have set for the log.retention.hours.So your issue will be solved.
log.retention.hours=168
Keep coding