Since I'm not allowed to set up Flume on prod servers, I have to download the logs, put them in a Flume spoolDir and have a sink to consume from the channel and write to Cassandra. Everything is working fine.
However, as I have a lot of log files in the spoolDir, and the current setup is only processing 1 file at a time, it's taking a while. I want to be able to process many files concurrently. One way I thought of is to use the spoolDir but distribute the files into 5-10 different directories, and define multiple sources/channels/sinks, but this is a bit clumsy. Is there a better way to achieve this?
Thanks
Just for the record, this has been answered in Flume's mailing list:
Hari Shreedharan wrote:
http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser