I would like to write multiple output files. How do I do this using Job instead of JobConf?
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- Java写文件至HDFS失败
- mapreduce count example
- Could you give me any clue Why 'Cannot call me
- Hive error: parseexception missing EOF
- Exception in thread “main” java.lang.NoClassDefFou
- ClassNotFoundException: org.apache.spark.SparkConf
- Compute first order derivative with MongoDB aggreg
- How can I configure the maven shade plugin to incl
an easy way to to create key based output file names
MultipleTextOutputFormat class
job config
Run this code and you’ll see the following files in HDFS, where /output is the job output directory:
hopes it helps.
The docs say to use
org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
instead.Below is a snippet of code that uses MultipleOutputs. Unfortunately I didn't write it and haven't spent much time with it... So I don't know exactly why things are where. I share with the hopes it helps. :)
Job Setup
Reducer Setup
EDIT: Added link to MultipleOutputs.