Sometimes my MR job complains that MyMapper class in not found. And that i have to give job.setJarByClass(MyMapper.class); to tell it to load it from my jar file.
cloudera@cloudera-vm:/tmp/translator$ hadoop jar MapReduceJobs.jar translator/input/Portuguese.txt translator/output 13/06/13 03:36:57 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/06/13 03:36:57 INFO input.FileInputFormat: Total input paths to process : 1 13/06/13 03:36:57 INFO mapred.JobClient: Running job: job_201305100422_0043 13/06/13 03:36:58 INFO mapred.JobClient: map 0% reduce 0% 13/06/13 03:37:03 INFO mapred.JobClient: Task Id : attempt_201305100422_0043_m_000000_0, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: com.mapreduce.variousformats.keyvaluetextinputformat.MyMapper at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:996) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:212) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:601)
Question : Why does it happen. Why doesn't it always tell me to load it from my jar file. Is there some best practices for tackling these kind of issues. Also if i am using some 3rd party libraries , will i have to do this for them as well.