可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has an answer here:

merge output files after reduce phase 10 answers

I know that "getmerge" command in shell can do this work.

But what should I do if I want to merge these outputs after the job by HDFS API for java？

What i actually want is a single merged file on HDFS.

The only thing i can think of is to start an additional job after that.

thanks!

回答1:

But what should I do if I want to merge these outputs after the job by HDFS API for java?

Guessing, because I haven't tried this myself, but I think the method you are looking for is FileUtil.copyMerge, which is the method that FsShell invokes when you run the -getmerge command. FileUtil.copyMerge takes two FileSystem objects as arguments - FsShell uses FileSystem.getLocal to retrieve the destination FileSystem, but I don't see any reason you couldn't instead use Path.getFileSystem on the destination to obtain an OutputStream

That said, I don't think it wins you very much -- the merge is still happening in the local JVM; so you aren't really saving very much over -getmerge followed by -put.

回答2:

You get a single Out-put File by Setting a single Reducer in your code .

Job.setNumberOfReducer(1);

Will work for your requirement , but costly

Static method to execute a shell command. 
Covers most of the simple cases without requiring the user to implement the Shell interface.

Parameters:
env the map of environment key=value
cmd shell command to execute.
Returns:
the output of the executed command.

org.apache.hadoop.util.Shell.execCommand(String[])

Hadoop: How can i merge reducer outputs to a singl

问题:

回答1:

回答2:

收藏的人(0)

Hadoop: How can i merge reducer outputs to a singl

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮