At some point in the driver method of an Hadoop algorithm we link the job to the references of the classes set as Mapper and Reducer. For example:
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
usually the driver method is the main
while mapper and reducer are implemented as inner static classes.
Suppose that MyMapper.class
and MyReducer.class
are inner static classes of MyClass.class
and that driver method is the main of MyClass.class
. Sometime I see the following line added right after the two from above:
job.setJarByClass(Myclass.class);
what is the meaning of this configuration step and when it is useful or mandatory?
In my case (I have a single-node cluster installation), If I remove this line, I can continue to run the job correctly. Why?