I am very new to hadoop , so please excuse the dumb questions.
I have the following knowledge Best usecase of Hadoop is large files thus helping in efficiency while running mapreduce tasks.
Keeping the above in mind I am somewhat confused about Flume NG. Assume I am tailing a log file and logs are produced every second, the moment the log gets a new line it will be transferred to hdfs via Flume.
a) Does this mean that flume creates a new file on every line that is logged in the log file I am tailing or does it append to the existing hdfs file ??
b) is append allowed in hdfs in the first place??
c) if the answer to b is true ?? ie contents are appended constantly , how and when should I run my mapreduce application?
Above questions could sound very silly but a answers to the same will be highly appreciated.
PS: I have not yet set up Flume NG or hadoop as yet, just reading the articles to get an understanding and how it could add value to my company.