Although I use Hadoop frequently on my Ubuntu machine I have never thought about SUCCESS
and part-r-00000
files. The output always resides in part-r-00000
file, but what is the use of SUCCESS
file? Why does the output file have the name part-r-0000
? Is there any significance/any nomenclature or is this just a randomly defined?
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- Java写文件至HDFS失败
- mapreduce count example
- Could you give me any clue Why 'Cannot call me
- Hive error: parseexception missing EOF
- Exception in thread “main” java.lang.NoClassDefFou
- ClassNotFoundException: org.apache.spark.SparkConf
- Compute first order derivative with MongoDB aggreg
- How can I configure the maven shade plugin to incl
See http://www.cloudera.com/blog/2010/08/what%E2%80%99s-new-in-apache-hadoop-0-21/
This would typically be used by job scheduling systems (such as OOZIE), to denote that follow-on processing on the contents of this directory can commence as all the data has been output.
Update (in response to comment)
The output files are by default named part-x-yyyyy where:
x
is either 'm' or 'r', depending on whether the job was a map only job, or reduceyyyyy
is the mapper or reducer task number (zero based)So a job which has 32 reducers will have files named part-r-00000 to part-r-00031, one for each reducer task.