Camus Migration - Kafka HDFS Connect does not star

2019-02-20 11:37发布

问题:

I am currently using Confluent HDFS Sink Connector (v4.0.0) to replace Camus. We are dealing with sensitive data so we need to maintain consistency in offset during cutover to connectors.

Cutover plan:

  1. We created hdfs sink connector and subscribed to a topic which writes to a temporary hdfs file. This creates a consumer group with name connect-
  2. Stopped the connector using DELETE request.
  3. Using /usr/bin/kafka-consumer-groups script, I am able to set the connector consumer group kafka topic partition's current offset to a desired value (i.e. last offset Camus wrote + 1).
  4. When i restart the hdfs sink connector, it continues reading from the last committed connector offset and ignores the set value. I am expecting the hdfs file name to be like: hdfs_kafka_topic_name+kafkapartition+Camus_offset+Camus_offset_plus_flush_size.format

Is my expectation of confluent connector behavior correct ?

回答1:

When you restart this connector, it will use the offset embedded in the file have of the last file written to hdfs. It will not use the consumer group offset. It does this because it uses a write ahead log to achieve exactly once deliver to hdfs.