I'm trying to understand how Kafka compaction works and have the following question: Does kafka guarantees uniqueness of keys for messages stored in topic with enabled compaction?
Thanks!
I'm trying to understand how Kafka compaction works and have the following question: Does kafka guarantees uniqueness of keys for messages stored in topic with enabled compaction?
Thanks!
Looking at 4 guarantees of kakfa compaction, number 4 states:
So, you will have more than one value for the key if the head of the topic is not being retained by the
delete.retention.ms
policy.As I understand it, if you set a 24h retention policy (
delete.retention.ms=86400000
), you'll have a unique value for a single key, for all messages that were from 24h ago. That's your at least, but not only, as many other messages for the same key may have arrived during the last 24 hours.So, it is guaranteed that you'll catch at least one, but not just the last, because retention didn't act on recent messages.
edit. As cricket's comment states, even if you set a delete retention property of 1 day, the
log.roll.ms
is what defines when a log segment is closed, based on message's timestamp. As this last segment is never retained for compaction, it becomes the second factor that doesn't allow you having just the last value for your known key. If your topic starts atT0
, then messages afterT0+log.roll.ms
will be on the open log segment, thus, not compacted.Short answer is no.
Kafka doesn't guarantees uniqueness for key stored with enabled topic retention.
In Kafka you have two types of
cleanup.policy
:delete
- It means that after configured time messages wan't be available. There are several properties, that can be used for that:log.retention.hours
,log.retention.minutes
,log.retention.ms
. By defaultlog.retention.hours
is set168
. It means, that messages older than 7 days will be deletedcompact
- For each key at least one message will be available. In some situation it can be one, but in the most cases it will be more. Compaction processed is run in background periodically. It copies log parts with removing duplicates and only leaving last value.If you want to read only one value for each key, you have to use
KTable<K,V>
abstraction from Kafka Streams.Related question regarding latest value for key and compaction: Kafka only subscribe to latest message?