hadoop: difference between 0 reducer and identity

I am just trying to confirm my understanding of difference between 0 reducer and identity reducer.

0 reducer means reduce step will be skipped and mapper output will be the final out
Identity reducer means then shuffling/sorting will still take place?

标签： hadoop mapreduce

4条回答

2楼-- · 2019-01-08 14:18

Another use-case for using the Identity Reducer is to combine all the results into <# of reducers> output files. This can be handy if you are using Amazon Web Services to write to S3 directly, especially if the mapper output is small (e.g. a grep/search for a record), and you have a lot of mappers (e.g. 1000's).

0人赞添加讨论(0) 举报

Ridiculous、

3楼-- · 2019-01-08 14:27

It depends on your business requirements. If you are doing a wordcount you should reduce your map output to get a total result. If you just want to change the words to upper case, you don't need a reduce.

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2019-01-08 14:32

You understanding is correct. I would define it as following: If you do not need sorting of map results - you set 0 reduced,and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
And to complete the picture we have a third case : we do need aggregation and, in this case we need reducer.

0人赞添加讨论(0) 举报

仙女界的扛把子

5楼-- · 2019-01-08 14:33

The main difference between "No Reducer" (mapred.reduce.tasks=0) and "Standard reducer" which is IdentityReducer (mapred.reduce.tasks=1 etc) is when you use "No reducer" there is no partitioning&shuffling processes after MAP stage. Therefore, in this case you will get 'pure' output from your mappers without any further processing. It helps for development and debugging puproses, but not only.

0人赞添加讨论(0) 举报

hadoop: difference between 0 reducer and identity

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间