With atleast-once guarantee, I understand that there is a possibility of duplicates in case of failures. However,
1) How frequent does the Kafka Stream library performs commits?
2) Does the users ever need consider committing in addition to the above?
3) Is there a best practice on how frequent the commit should be performed?
问题:
回答1:
Kafka Streams commits in regular intervals that can be configured via parameter commit.interval.ms
(default is 30 seconds; if exactly-once processing is enabled, default is 100ms).
Usually, it's not necessary for users to commit manually. Note thought, that users don't have complete control over committing, but can only request commits: cf. How to commit manually with Kafka Stream?
Commits are synchronizations point and if you commit too frequently (for an extreme example after every processed record) your throughput can drop significantly. It also highly application dependent, because the commit frequency determines how many potential duplicates the application processes (this also depends on the input data rate). Thus, you need to consider how many duplicates in case of failure your are willing to tolerate. It also depends how long it will take for the application to reprocess the data: during this time the application might not be fully available. Overall, it's hard to give a recommendation and you need to consider the described trade-offs for each application individually.