I'm a Kafka newbie.
I have tried some file reading examples and applied to my project for couple of weeks. However, my application does not seem to work as I wanted so I'm asking for your advise.
My intention is to :
- Kafka producer reads files from directory A.
- Storm consumes data that's been produced from 1.
- Move away once read file to some other directory.
Condition :
- Files are continuously sent into directory A.
It is a simple logic but it gives me headache.
So far I have created and tested Kafka producer code on my local computer eclipse.
What I thought is, because kafka producer should keep on reading files, the process has to be kept alive even if all the files in directory A are read. But instead it terminates as soon as all the files in directory A have been read and sent.
I run Kafka on a single node with 3 brokers, and the following is Producer properties setting.
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("producer.type", "sync");
props.put("request.required.acks", "1");
Topic has been created with the following command.
bin/kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --replication-factor 1 --topic test
Is my thought of continuously file reading wrong in Kafka's architectural perspective? Or is there a way that I yet to find for? I'd be very appreciated if someone can answer my problems.
You should use
kafka.serializer.DefaultSerializer
(binary).How are you monitoring folder for new files? You could use something like
apache.commons.io.monitor
. Take a look here.Where are you stuck? What are the problems you need to address (error messages, anything really)? Asking because it looks like you want a complete solution from someone, which is not what SO provides. Dig in and ask specific questions, and of course, post the code.