Send executable jar to hadoop cluster and run as “

2019-07-24 17:54发布

问题:

I commonly make a executable jar package with a main method and run by the commandline "hadoop jar Some.jar ClassWithMain input output"

In this main method, Job and Configuration may be configured and Configuration class has a setter to specify mapper or reducer class like conf.setMapperClass(Mapper.class).

However, In the case of submitting job remotely, I should set jar and Mapper or more classes to use hadoop client api.

job.setJarByClass(HasMainMethod.class);
job.setMapperClass(Mapper_Class.class);
job.setReducerClass(Reducer_Class.class);

I want to programmatically transfer jar in client to remote hadoop cluster and execute this jar like "hadoop jar" command to make main method specify mapper and reducer.

So how can I deal with this problem?

回答1:

hadoop is only a shell script. Eventually, hadoop jar will invoke org.apache.hadoop.util.RunJar. What hadoop jar do is helping you set up the CLASSPATH. So you can use it directly.

For example,

String input = "...";
String output = "...";
org.apache.hadoop.util.RunJar.main(
    new String[]{"Some.jar", "ClassWithMain", input, output});

However, you need to set the CLASSPATH correctly before you use it. A convenient way to get the correct CLASSPATH is hadoop classpath. Type this command and you will get the full CLASSPATH.

Then set up the CLASSPATH before you run your java application. For example,

export CLASSPATH=$(hadoop classpath):$CLASSPATH
java -jar YourJar.jar