sqoop exec job in oozie is not working

2019-09-02 12:32发布

问题:

I am running a 3 node HDP 2.2 cluster. Oozie version is 4.1.0.2.2 and Sqoop version is 1.4.5.2.2. I am using Sqoop job to do incremental imports from RDBMS into HDFS as shown below,

sqoop job –create JOB1 –meta-connect “jdbc:hsqldb:hsql://ip-address:16000/sqoop” — import –connect jdbc:oracle:thin:@ip-address:db –username db_user –password-file hdfs://ip-address:8020/user/oozie/.password_sqoop –table TABLE1 –target-dir /user/incremental/ –incremental lastmodified –check-column LAST_UPDATED –last-value “2013-08-12 18:13:44.0″ –merge-key ID –fields-terminated-by ‘|';

sqoop job –exec JOB1

The above 2 sqoop commands are working perfectly fine when run from command prompt. I am using sqoop-metastore (HSQLDB) to store the sqoop jobs.

The sqoop create job is working in OOZIE and i am able to see the sqoop job being listed in the sqoop-metastore after oozie job completes.

But when i put the sqoop exec job in the oozie workflow i get, “Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]” error. The underlying map reduce job however shows as completed successfully. Checked for logs in /var/log/oozie but nothing there as well.

workflow.xml :

<workflow-app xmlns=”uri:oozie:workflow:0.4″ name=”oozie-wf”>
<start to=”sqoop-wf”/>
<action name=”sqoop-wf”>
<sqoop xmlns=”uri:oozie:sqoop-action:0.2″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<command>job –meta-connect “jdbc:hsqldb:hsql://ip-address:16000/sqoop” –exec JOB1</command>
</sqoop>
<ok to=”end”/>
<error to=”fail”/>
</action>
<kill name=”fail”>
<message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name=”end”/>
</workflow-app>

job.properties :

nameNode=hdfs://ip-address:8020
jobTracker=ip-address:8050

oozie.wf.application.path=hdfs://ip-address:8020/user/oozie/sqoopoozie
oozie.use.system.libpath=true
oozie.sqoop.log.level=DEBUG

I have tried in multiple different ways for sqoop exec job in oozie but nothing works. Please help.

回答1:

Please ensure you are specifying your metastore in sqoop-site.xml and you are passing the site xml in your oozie workflow folder. You can pass it with xml tag .



回答2:

It puzzles me as to how sqoop job --exec JOB1 works in the command prompt. The right syntax for executing the job (from command prompt) would be

sqoop job --exec JOB1 --meta-connect jdbc:hsqldb:hsql://ip-address:16000/sqoop

Perhaps you executed the job from the same machine that is running your sqoop metastore

Please make sure you have done the following

  1. Started sqoop metastore on a the machine with ip=ip-address
  2. Made necessary configuration changes in sqoop-site.xml (follow this link)

As for the oozie workflow, try the following workflow.xml

<workflow-app xmlns=”uri:oozie:workflow:0.4″ name=”oozie-wf”>
    <start to=”sqoop-wf”/>
    <action name=”sqoop-wf”>
        <sqoop xmlns=”uri:oozie:sqoop-action:0.2″>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>fs.hdfs.impl.disable.cache</name>
                    <value>true</value>
                </property>
            </configuration>
            <arg>job</arg>
            <arg>--exec</arg>
            <arg>JOB1</arg>
            <arg>--meta-connect jdbc:hsqldb:hsql://ip-address:16000/sqoop</arg>
        </sqoop>
        <ok to=”end”/>
        <error to=”fail”/>
    </action>
    <kill name=”fail”>
        <message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name=”end”/>
</workflow-app>


回答3:

In the case of our cluster, creating jobs through Hue UI without specifying the metastore (using the --meta-connect parameter) resulted in jobs being only sometimes returned by the sqoop job --list command. In command prompt creating and listing jobs on the same machine worked; however, machine B could not find jobs created on machine A.

If you also can't access jobs can access jobs created on a different node, it seems like your sqoop metastore is not running. To solve it:

  1. start the metastore on one of the nodes with the sqoop-metastore command,
  2. check if the metastore is running: $ ps aux | grep sqoop should return one of the results as a Sqoop metastore,
  3. specify the metastore path in sqoop-site.xml files. In our case we couldn't make ot work though, so we did it using the next step,
  4. if necessary, both in shell and Hue UI, specify the metastore path in every sqoop job command (create, list, show, exec, delete) as
    $ sqoop job --exec my_job --meta-connect jdbc:hsqldb:hsql://node-running-metastore:16000/sqoop.

Note: To start the metastore, consider using & and nohup to run the metastore process in the background and to capture logs into a file:

$ nohup sqoop-metastore &>> /tmp/nohup-sqoop-metastore.out &