I commonly make a executable jar package with a main method and run by the commandline "hadoop jar Some.jar ClassWithMain input output"
In this main method, Job and Configuration may be configured and Configuration class has a setter to specify mapper or reducer class like conf.setMapperClass(Mapper.class).
However, In the case of submitting job remotely, I should set jar and Mapper or more classes to use hadoop client api.
job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);
I want to programmatically transfer jar in client to remote hadoop cluster and execute this jar like "hadoop jar" command to make main method specify mapper and reducer.
So how can I deal with this problem?
hadoop
is only a shell script. Eventually,hadoop jar
will invokeorg.apache.hadoop.util.RunJar
. Whathadoop jar
do is helping you set up theCLASSPATH
. So you can use it directly.For example,
However, you need to set the
CLASSPATH
correctly before you use it. A convenient way to get the correctCLASSPATH
ishadoop classpath
. Type this command and you will get the fullCLASSPATH
.Then set up the
CLASSPATH
before you run your java application. For example,