Tips to improve MapReduce Job performance in Hadoo

2019-01-27 09:30发布

I have 100 mapper and 1 reducer running in a job. How to improve the job performance?

As per my understanding: Use of combiner can improve the performance to great extent. But what else we need to configure to improve the jobs performance?

标签： performance hadoop mapreduce hadoop2

1条回答

对你真心纯属浪费

2楼-- · 2019-01-27 09:55

With the limited data in this question ( Input file size, HDFS block size, Average map processing time, Number of Mapper slots & Reduce slots in cluster etc.), we can't suggest tips.

But there are some general guidelines to improve the performance.

If each task takes less than 30-40 seconds, reduce the number of tasks
If a job has more than 1TB of input, consider increasing the block size of the input dataset to 256M or even 512M so that the number of tasks will be smaller.
So long as each task runs for at least 30-40 seconds, increase the number of mapper tasks to some multiple of the number of mapper slots in the cluster
Number of reduce tasks per a job should be equal to or a bit less than the number of reduce slots in the cluster.

Some more tips :

Configure the cluster properly with right diagnostic tools
Use compression when you are writing intermediate data to disk
Tune number of Map & Reduce tasks as per above tips
Incorporate Combiner wherever it is appropriate
Use Most appropriate data types for rendering Output ( Do not use LongWritable when range of output values are in Integer range. IntWritable is right choice in this case)
Reuse Writables
Have right profiling tools

Have a look at this cloudera article for some more tips.

0人赞添加讨论(0) 举报

Tips to improve MapReduce Job performance in Hadoo

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间