How to Handle Different Timezone in Kafka Streams?

2020-02-13 02:48发布

So I was evaluating the Kafka Streams and what it can do to see if it can fit my use case as I needed to do the aggregation of sensor's data for each 15min, Hourly, Daily and found it useful due to its Windowing feature. As I can create windows by applying windowedBy() on KGroupedStream but the problem is that windows are created in UTC and i want my data to be grouped by its originating timezone not by UTC Timezone as it hampers the aggregation so can any one help me on this.

2条回答
家丑人穷心不美
2楼-- · 2020-02-13 03:18

You can "shift" the timestamps using a custom TimestampExtractor -- before you write the result back into the output topic, you can use a Transformer and "shift" the timestamps back via context.forward(key, value, To.all().withTimestamps()).

Feature request ticket: https://issues.apache.org/jira/browse/KAFKA-7911

查看更多
小情绪 Triste *
3楼-- · 2020-02-13 03:31

So to solve this issue I created custom TimestampExtractor and used it to change the streams window creation time to record time from the payload as show below.

public class RecordTimeStampExtractor implements TimestampExtractor {

    @Override
    public long extract(ConsumerRecord<Object, Object> record, long previousTimestamp) {
        JsonObject data = (JsonObject) new JsonParser().parse(record.value().toString());
        Timestamp recordTimestamp = Timestamp.valueOf(data.get(Constant.SLOT).getAsString());
        return recordTimestamp.getTime();
    }

}

so now I have tested it with my local timezone since yesterday which is IST 05:30 and its working fine also kafka streams are creating windows based on records timestamp. Will test with other timezone as well and update the answer

查看更多
登录 后发表回答