I am new to flume.My flume agent having source as http server,from where it getting zip files(compressed xml files) on regular interval.This zip files are very small (less than 10 mb) and i want to put the zip files extracted into the hdfs sink.Please share some idea how to do this.Do i have to go for a custom interceptor.
相关问题
- Cannot run Flume because of JAR conflict
- Flume not processing keywords from Twitter source
- Flume HDFS sink: Remove timestamp from filename
- How to handle multiline log entries in Flume
- Can I extend Flume sink to make it write different
相关文章
- Is it possible to write Flume headers to HDFS sink
- Flume NG and HDFS
- Flume to migrate data from MySQL to Hadoop
- How to setup a HTTP Source for testing Flume setup
- Rebalancing issue while reading messages in Kafka
- 水槽到HBase的dependencie失败(Flume to HBase dependencie
- 水槽 - TwitterSource语言过滤器(Flume - TwitterSource lan
- Dataingestion与水槽和Hadoop的不起作用(Dataingestion with Fl
Flume will try to read your files line by line, except if you configure a specific deserializer. A deserializer lets you control how the file is parsed and split into events. You could of course follow the example of the blob deserizalizer, which is designed for PDFs and such, but I understand that you actually want to unpack them and then read them line by line. In that case you would need to write a custom deserializer which reads Zip and writes line by line events.
Here's the reference in the documentation:
https://flume.apache.org/FlumeUserGuide.html#event-deserializers