Kafka Connect HDFS Sink for JSON format using Json

2019-09-11 16:17发布

问题:

Produce to/Consume from Kafka in JSON. Save to HDFS in JSON using below properties :

key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false

Producer :

curl -X POST -H "Content-Type: application/vnd.kafka.json.v1+json" \
      --data '{"schema": {"type": "boolean", "optional": false, "name": "bool", "version": 2, "doc": "the documentation", "parameters": {"foo": "bar" }}, "payload": true }' "http://localhost:8082/topics/test_hdfs_json"

Consumer :

./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties

Issue-1:

key.converter.schemas.enable=true

value.converter.schemas.enable=true

Getting Exception:

org.apache.kafka.connect.errors.DataException: JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
    at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:332)

Issue-2:

Enabling above two properties is not throwing any issue, but no data are written over hdfs.

Any suggestion will be highly appreciated.

Thanks

回答1:

The converter refers to how the data will be translated from the Kafka topic to be interpreted by the connector and written to HDFS. The HDFS connector only supports writing to HDFS in avro or parquet out of the box. You can find the information on how to extend the format to JSON here. If you make such an extension I encourage you to contribute it to the open source project for the connector.



回答2:

For input Json format messages to be written into HDFS, please set below properties

key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false