I am writing a program that receives the source code of the mapper/reducers, dynamically compiles the mappers/reducers and makes a JAR file out of them. It then has to run this JAR file on a hadoop cluster.
For the last part, I setup all the required parameters dynamically through my code. However, the problem I am facing now is that the code requires the compiled mapper and reducer classes at the time of compiling. But at the time of compiling, I do not have these classes and they will later be received during the run time (e.g. through a message received from a remote node). I would appreciate any idea/suggestion on how to pass this problem?
Here's below you can find the code for my last part with the problem being at job.setMapperClass(Mapper_Class.class) and job.setReducerClass(Reducer_Class.class) requiring the classes (Mapper_Class.class and Reducer_Class.class) files to be present at the time of compiling:
private boolean run_Hadoop_Job(String className){
try{
System.out.println("Starting to run the code on Hadoop...");
String[] argsTemp = { "project_test/input", "project_test/output" };
// create a configuration
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:54310");
conf.set("mapred.job.tracker", "localhost:54311");
conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator
+ className+".jar");
conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
// create a new job based on the configuration
Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
job.setJarByClass(Platform.class);
//job.setMapperClass(Mapper_Class.class);
//job.setReducerClass(Reducer_Class.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path(argsTemp[1]);
fs.delete(out, true);
// finally set the empty out path
FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));
//job.submit();
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("Job Finished!");
} catch (Exception e) { return false; }
return true;
}
Revised: So I revised the code to specify the mapper and reducers using conf.set("mapreduce.map.class, "my mapper.class"). Now the code compiles correctly but when it is executed it throws the following error:
ec 24, 2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob INFO: Task Id : attempt_201212240511_0006_m_000001_2, Status : FAILED java.lang.RuntimeException: java.lang.ClassNotFoundException: Mapper_Reducer_Classes$Mapper_Class.class at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170)
The problem is that TaskTracker cannot see classes in your local jRE.
I figured it out in this way(Maven project);
First, add this plugin to pom.xml, it will build your application jar file including all the dependency jars,
in the java source code, add these lines, it will include your sample.jar built to target/sample.jar by tag above in pom.xml.
in this way, your tasktrackers can refer to classes you wrote and ClassNotFoundException will not happen.
If you don't have them at compile time, then directly set the name in the configuration like this:
You only need a reference to the Class object for the class that will be dynamically created. Use
Class.for name("foo.Mapper")
instead offoo.Mapper.class