I know that in new Kafka versions we have new retention policy option - compaction of log which delete old version of messages with same keys. But after long time we will get too many compacted log segments with old messages. How can we clean this compacted log automatically?
UDPATE:
I should clarify that we need compact log and way to clean up old messages this in those time. I found discussion for same problem here http://grokbase.com/t/kafka/users/14bv6gaz0t/kafka-0-8-2-log-cleaner but not found how we can manually issue thomstone markers for message and have not any idea this.
The only other way to lower the size of your Kafka Log would be through the log retention configuration settings.
Those two dictate when logs are deleted in Kafka. The log.retention.bytes defaults to -1, and I'm pretty sure leaving it to -1 allows only the time config to solely determine when a log gets deleted.
Log retention and compaction work separately from each other. Using retention, logs can be deleted after a certain time or size even with log compaction on. So if you theoretically have a 100Mb log limit, and set the your log.retention.bytes=104857600 (100Mb). Kafka will compact your log until it reaches 100Mb in size, and then will delete the necessary messages (oldest first) until the log is under 100Mb in size.
EDIT:
It turns out that log retention and compaction are mutually exclusive based on this link provided by mechanikos. Though it seems odd that Kafka is designed so a log will grow indefinitely with no capability of ever deleting old log messages.
This question is quite old, but I thought I'd give the latest update on the matter. There is a feature (https://issues.apache.org/jira/browse/KAFKA-4015) which is already resolved and is scheduled for the 0.10.1.0 release.