I have 100 mapper and 1 reducer running in a job. How to improve the job performance?
As per my understanding: Use of combiner can improve the performance to great extent. But what else we need to configure to improve the jobs performance?
I have 100 mapper and 1 reducer running in a job. How to improve the job performance?
As per my understanding: Use of combiner can improve the performance to great extent. But what else we need to configure to improve the jobs performance?
With the limited data in this question ( Input file size, HDFS block size, Average map processing time, Number of Mapper slots & Reduce slots in cluster etc.), we can't suggest tips.
But there are some general guidelines to improve the performance.
Some more tips :
LongWritable
when range of output values are inInteger
range.IntWritable
is right choice in this case)Writables
Have a look at this cloudera article for some more tips.