So I was evaluating the Kafka Streams and what it can do to see if it can fit my use case as I needed to do the aggregation of sensor's data for each 15min, Hourly, Daily and found it useful due to its Windowing feature.
As I can create windows by applying windowedBy()
on KGroupedStream
but the problem is that windows are created in UTC and i want my data to be grouped by its originating timezone not by UTC Timezone as it hampers the aggregation so can any one help me on this.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You can "shift" the timestamps using a custom TimestampExtractor
-- before you write the result back into the output topic, you can use a Transformer
and "shift" the timestamps back via context.forward(key, value, To.all().withTimestamps())
.
Feature request ticket: https://issues.apache.org/jira/browse/KAFKA-7911
回答2:
So to solve this issue I created custom TimestampExtractor
and used it to change the streams window creation time to record time from the payload as show below.
public class RecordTimeStampExtractor implements TimestampExtractor {
@Override
public long extract(ConsumerRecord<Object, Object> record, long previousTimestamp) {
JsonObject data = (JsonObject) new JsonParser().parse(record.value().toString());
Timestamp recordTimestamp = Timestamp.valueOf(data.get(Constant.SLOT).getAsString());
return recordTimestamp.getTime();
}
}
so now I have tested it with my local timezone since yesterday which is IST 05:30 and its working fine also kafka streams are creating windows based on records timestamp. Will test with other timezone as well and update the answer