Reading Flume spoolDir in parallel

2019-05-14 09:41发布

Since I'm not allowed to set up Flume on prod servers, I have to download the logs, put them in a Flume spoolDir and have a sink to consume from the channel and write to Cassandra. Everything is working fine.

However, as I have a lot of log files in the spoolDir, and the current setup is only processing 1 file at a time, it's taking a while. I want to be able to process many files concurrently. One way I thought of is to use the spoolDir but distribute the files into 5-10 different directories, and define multiple sources/channels/sinks, but this is a bit clumsy. Is there a better way to achieve this?

Thanks

标签： apache flume flume-ng

1条回答

看我几分像从前

2楼-- · 2019-05-14 10:11

Just for the record, this has been answered in Flume's mailing list:

Hari Shreedharan wrote:

Unfortunately, no. The spoolDir source was kept single-threaded so that deserializer implementations can be kept simple. The approach with mutliple spoolDir sources is the correct one, though they can all write to the same channel(s) - so you'd need only a larger number of sources, they can all share the same channel(s) and you don't need more sinks unless you want to pull data out faster.

http://mail-archives.apache.org/mod_mbox/flume-user/201409.mbox/browser

0人赞添加讨论(0) 举报

Reading Flume spoolDir in parallel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间