How to increase the mappers and reducers in hadoop

If I increase the number of mappers and decrease the number of reducers, then is there any difference in the performance (increase/decrease) of any job while execution?

Also I want to ask that How to set the number of mappers and reducers? I have never played with this setting thats why I don't know about this. I know hadoop but I have code with it as I use Hive a lot.

Also If I want to increase the number of mappers and reducers then how to set it and upto what value do I set it. Is it depend upon the number of instances (Lets say 10)?

Please reply me I want to try this and check the performance. Thanks.

标签： hadoop mapper reducers

4条回答

smile是对你的礼貌

2楼-- · 2019-01-26 02:46

Changing number of mappers - is pure optimization which should not affect results. You should set number to fully utilize your cluster (if it is dedicated). Try number of mappers per node equal to number of cores. Look on CPU utilization, and increase the number until you get almost full CPU utilization or, you system start swapping. It might happens that you need less mappers then cores, if you have not enough memory.
Number of reducers impacts results so , if you need specific number of reducer (like 1) - set it
If you can handle results of any number of reducers - do the same optimization as with Mappers.
Theoretically you can became IO bound during this tuning process - pay attention to this also when tuning number of tasks. You can recognieze it by low CPU utilization despite increase of mappers / reducers count.

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2019-01-26 02:58

You can increase number of mappers based on the block size and split size. One of the easiest way is to decrease the split size as shown below:

Configuration conf= new Cofiguration();
//set the value that increases your number of splits.
conf.set("mapred.max.split.size", "1020");
Job job = new Job(conf, "My job name");

0人赞添加讨论(0) 举报

相关推荐>>

4楼-- · 2019-01-26 02:59

I have tried the suggestion from @Animesh Raj Jha by modifying mapred.max.split.size and got a noticeable performance increase.

0人赞添加讨论(0) 举报

你好瞎i

5楼-- · 2019-01-26 03:04

i am using hadoop 2.2, and don't know how to set max input split size I would like to decrease this value, in order to create more mappers I tried updating yarn-site.xml, and but it does not work

indeed, hadoop 2.2 /yarn does not take of none the following settings

<property>
<name>mapreduce.input.fileinputformat.split.minsize</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.split.maxsiz e</name>
<value>16777216</value>
</property>

<property>
<name>mapred.min.split.size</name>
<value>1</value>
</property>
<property>
<name>mapred.max.split.size</name>
<value>16777216</value>
</property>

best

0人赞添加讨论(0) 举报

How to increase the mappers and reducers in hadoop

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间