Produce to/Consume from Kafka in JSON. Save to HDFS in JSON using below properties :
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
Producer :
curl -X POST -H "Content-Type: application/vnd.kafka.json.v1+json" \
--data '{"schema": {"type": "boolean", "optional": false, "name": "bool", "version": 2, "doc": "the documentation", "parameters": {"foo": "bar" }}, "payload": true }' "http://localhost:8082/topics/test_hdfs_json"
Consumer :
./bin/connect-standalone etc/schema-registry/connect-avro-standalone.properties etc/kafka-connect-hdfs/quickstart-hdfs.properties
Issue-1:
key.converter.schemas.enable=true
value.converter.schemas.enable=true
Getting Exception:
org.apache.kafka.connect.errors.DataException: JsonDeserializer with schemas.enable requires "schema" and "payload" fields and may not contain additional fields
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:332)
Issue-2:
Enabling above two properties is not throwing any issue, but no data are written over hdfs.
Any suggestion will be highly appreciated.
Thanks
The converter refers to how the data will be translated from the Kafka topic to be interpreted by the connector and written to HDFS. The HDFS connector only supports writing to HDFS in avro or parquet out of the box. You can find the information on how to extend the format to JSON here. If you make such an extension I encourage you to contribute it to the open source project for the connector.
For input Json format messages to be written into HDFS, please set below properties