Apache Kafka Streams Materializing KTables to a to

2019-04-27 04:13发布

问题:

I'm using kafka stream and I'm trying to materialize a KTable into a topic.

It works but it seems to be done every 30 secs or so.

How/When does Kafka Stream decides to materialize the current state of a KTable into a topic ?

Is there any way to shorten this time and to make it more "real-time" ?

Here is the actual code I'm using

// Stream of random ints: (1,1) -> (6,6) -> (3,3)
// one record every 500ms
KStream<Integer, Integer> kStream = builder.stream(Serdes.Integer(), Serdes.Integer(), RandomNumberProducer.TOPIC);

// grouping by key
KGroupedStream<Integer, Integer> byKey = kStream.groupByKey(Serdes.Integer(), Serdes.Integer());

// same behaviour with or without the TimeWindow
KTable<Windowed<Integer>, Long> count = byKey.count(TimeWindows.of(1000L),"total");

// same behaviour with only count.to(Serdes.Integer(), Serdes.Long(), RandomCountConsumer.TOPIC);
count.toStream().map((k,v) -> new KeyValue<>(k.key(), v)).to(Serdes.Integer(), Serdes.Long(), RandomCountConsumer.TOPIC);

回答1:

This is controlled by commit.interval.ms, which defaults to 30s. More details here: http://docs.confluent.io/current/streams/developer-guide.html

The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node whenever the earliest of commit.interval.ms or cache.max.bytes.buffering (cache pressure) hits.

and here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams