I am trying to load a data file in loop(to check stats) instead of standard input in Kafka. After downloading Kafka, I performed the following steps:
Started zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Started Server:
bin/kafka-server-start.sh config/server.properties
Created a topic named "test":
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Ran the Producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Test1
Test2
Listened by the Consumer:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Test1
Test2
Instead of Standard input, I want to pass a data file to the Producer which can be seen directly by the Consumer. Or is there any kafka producer instead of console consumer using which I can read data files. Any help would really be appreciated. Thanks!
If there is always a single file, you can just use tail command and then pipeline it to kafka console producer.
But if a new file will be created when some conditions met, you may need use apache.commons.io.monitor to monitor new file created, then repeat above.
You can read data file via cat and pipeline it to kafka-console-producer.sh.
Follow this link: http://grokbase.com/t/kafka/users/157b71babg/kafka-producer-input-file
Kafka has this built-in File Stream Connector, for piping the content of a file to producer(file source), or directing file content to another destination(file sink).
We have
bin/connect-standalone.sh
to read from file which can be configured inconfig/connect-file-source.properties
andconfig/connect-standalone.properties
.So the command will be:
You can probably try the kafkacat utility as well. The readme on Github provides examples
It would be great if you could share which tool worked the best for you :)
Details from KafkaCat Readme:
Read messages from stdin, produce to 'syslog' topic with snappy compression