I'm chaining multiple MapReduce jobs and want to pass along/store some meta information (e.g. configuration or name of original input) with the results. At least the file "_SUCCESS" and also anything in the directory "_logs" seams to be ignored.
Are there any filename patterns which are by default ignored by the InputReader? Or is this just a fixed limited list?
The
FileInputFormat
uses the following hiddenFileFilter by default:So if you uses any
FileInputFormat
(such asTextInputFormat
,KeyValueTextInputFormat
,SequenceFileInputFormat
), the hidden files (the file name starts with "_" or ".") will be ignored.You can use FileInputFormat.setInputPathFilter to set your custom
PathFilter
. Remember that thehiddenFileFilter
is always active.