I am trying to load a data file in loop(to check stats) instead of standard input in Kafka. After downloading Kafka, I performed the following steps:
Started zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Started Server:
bin/kafka-server-start.sh config/server.properties
Created a topic named "test":
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Ran the Producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Listened by the Consumer:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Instead of Standard input, I want to pass a data file to the Producer which can be seen directly by the Consumer. Or is there any kafka producer instead of console consumer using which I can read data files. Any help would really be appreciated. Thanks!
You can read data file via cat and pipeline it to kafka-console-producer.sh.
cat ${datafile} | ${kafka_home}/bin/kafka-console-producer.sh --broker-list ${brokerlist} --topic test
If there is always a single file, you can just use tail command and then pipeline it to kafka console producer.
But if a new file will be created when some conditions met, you may need use apache.commons.io.monitor to monitor new file created, then repeat above.
You can probably try the kafkacat utility as well.
The readme on Github provides examples
It would be great if you could share which tool worked the best for you :)
Details from KafkaCat Readme:
Read messages from stdin, produce to 'syslog' topic with snappy compression
$ tail -f /var/log/syslog | kafkacat -b mybroker -t syslog -z snappy
kafka-console-produce.sh \
--broker-list localhost:9092 \
--topic my_topic \
--new-producer < my_file.txt
Follow this link: http://grokbase.com/t/kafka/users/157b71babg/kafka-producer-input-file
Kafka has this built-in File Stream Connector, for piping the content of a file to producer(file source), or directing file content to another destination(file sink).
We have bin/connect-standalone.sh
to read from file which can be configured in config/connect-file-source.properties
and config/connect-standalone.properties
So the command will be:
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties