How to pass Jar files to shell script in OOZIE she

2020-07-23 04:32发布

问题:

Hi I am getting below error while running a java program in a script which is getting executed in oozie shell action workflow.

Stdoutput 2015-08-25 03:36:02,636  INFO [pool-1-thread-1] (ProcessExecute.java:68) - Exception in thread "main" java.io.IOException: Error opening job jar: /tmp/jars/first.jar
Stdoutput 2015-08-25 03:36:02,636  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at org.apache.hadoop.util.RunJar.main(RunJar.java:124)
Stdoutput 2015-08-25 03:36:02,636  INFO [pool-1-thread-1] (ProcessExecute.java:68) - Caused by: java.io.FileNotFoundException: /tmp/jars/first.jar (No such file or directory)
Stdoutput 2015-08-25 03:36:02,636  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at java.util.zip.ZipFile.open(Native Method)
Stdoutput 2015-08-25 03:36:02,637  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at java.util.zip.ZipFile.<init>(ZipFile.java:215)
Stdoutput 2015-08-25 03:36:02,637  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at java.util.zip.ZipFile.<init>(ZipFile.java:145)
Stdoutput 2015-08-25 03:36:02,637  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at java.util.jar.JarFile.<init>(JarFile.java:154)
Stdoutput 2015-08-25 03:36:02,637  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at java.util.jar.JarFile.<init>(JarFile.java:91)
Stdoutput 2015-08-25 03:36:02,640  INFO [pool-1-thread-1] (ProcessExecute.java:68) -    at org.apache.hadoop.util.RunJar.main(RunJar.java:122)
Exit code of the Shell command 1

Following are the files details:

job.properties:

nameNode=maprfs:///
jobTracker=maprfs:///
queueName=nitin
EXEC=execution.jar
ozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true

oozie.wf.application.path=maprfs:/dev/user/oozieTest

workflow.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<workflow-app name="test" xmlns="uri:oozie:workflow:0.4">
    <start to="first" />
    <action name="first">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                </configuration>
            <exec>script</exec>
        <argument>-type mine</argument>
        <argument>-cfg config.cfg</argument>
            <file>script</file>
            <file>${EXEC}#${EXEC}</file>
            <file>config.cfg</file>
            <file>first.jar#first.jar</file>
            <file>second.jar#second.jar</file>
        </shell>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end" />
</workflow-app>

script:

#!/bin/bash
#get the user who executed the script
EXECUTING_USER="user1"

# get start time

NOW=$(date +"%T")

#get the host name

HOST="$HOSTNAME"

ARGUMENTSTRING="$@ -user user1 -startTime $NOW"
echo "Passing the following arguments : $ARGUMENTSTRING"

java -cp execution.jar com.hadoop.test.Main "$ARGUMENTSTRING"

exit $?

I am taking first.jar in my execution.jar file from /tmp/jars directory, reason is this directory will not create any permission issue to oozie workflow user.

Any direction/suggestions will be really helpful.

My question in nut shell:

  • I want to execute a script in oozie shell action node.
  • Script which gets executed from oozie shell action node will run a java program
  • That java program based upon the arguments will run the first.jar or second.jar

回答1:

I would suggest you to somehow shift the dependency out of shell script into java code and run it using the oozie java action node, which will simplify the process to good extend.

Incase if running the Java jar from oozie shell action node is your last option, then we shall very well do that, but it is little bit complicated as for as I know.

Main concerns are,

  • Any Oozie action can not refer contents on local file system of the node, where it can refer only the contents on HDFS
  • The Java binary command can only refer files on local file system.

So follow the below steps which might help you to co-ordinate what you expect.

  1. Place your Jar file on HDFS
  2. Pass the HDFS absolute path of the Jar as an argument to the shell script.
  3. From the shell script , copy the Jar from HDFS to local on the node , where the action is running, on a fixed location (may be /tmp as you preferred) using copyToLocal cmd.
  4. Invoke the Jar file using the Java command on that node
  5. On completion if any output produced by the Jar file to be conveyed to next action, then copy that output files from local to HDFS from the shell script using copyFromLocal.