I am running a 3 node HDP 2.2 cluster. Oozie version is 4.1.0.2.2 and Sqoop version is 1.4.5.2.2. I am using Sqoop job to do incremental imports from RDBMS into HDFS as shown below,
sqoop job –create JOB1 –meta-connect “jdbc:hsqldb:hsql://ip-address:16000/sqoop” — import –connect jdbc:oracle:thin:@ip-address:db –username db_user –password-file hdfs://ip-address:8020/user/oozie/.password_sqoop –table TABLE1 –target-dir /user/incremental/ –incremental lastmodified –check-column LAST_UPDATED –last-value “2013-08-12 18:13:44.0″ –merge-key ID –fields-terminated-by ‘|';
sqoop job –exec JOB1
The above 2 sqoop commands are working perfectly fine when run from command prompt. I am using sqoop-metastore (HSQLDB) to store the sqoop jobs.
The sqoop create job is working in OOZIE and i am able to see the sqoop job being listed in the sqoop-metastore after oozie job completes.
But when i put the sqoop exec job in the oozie workflow i get, “Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]”
error. The underlying map reduce job however shows as completed successfully. Checked for logs in /var/log/oozie but nothing there as well.
workflow.xml :
<workflow-app xmlns=”uri:oozie:workflow:0.4″ name=”oozie-wf”>
<start to=”sqoop-wf”/>
<action name=”sqoop-wf”>
<sqoop xmlns=”uri:oozie:sqoop-action:0.2″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<command>job –meta-connect “jdbc:hsqldb:hsql://ip-address:16000/sqoop” –exec JOB1</command>
</sqoop>
<ok to=”end”/>
<error to=”fail”/>
</action>
<kill name=”fail”>
<message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name=”end”/>
</workflow-app>
job.properties :
nameNode=hdfs://ip-address:8020
jobTracker=ip-address:8050
oozie.wf.application.path=hdfs://ip-address:8020/user/oozie/sqoopoozie
oozie.use.system.libpath=true
oozie.sqoop.log.level=DEBUG
I have tried in multiple different ways for sqoop exec job in oozie but nothing works. Please help.
Please ensure you are specifying your metastore in sqoop-site.xml and you are passing the site xml in your oozie workflow folder. You can pass it with xml tag .
It puzzles me as to how sqoop job --exec JOB1
works in the command prompt. The right syntax for executing the job (from command prompt) would be
sqoop job --exec JOB1 --meta-connect jdbc:hsqldb:hsql://ip-address:16000/sqoop
Perhaps you executed the job from the same machine that is running your sqoop metastore
Please make sure you have done the following
- Started sqoop metastore on a the machine with ip=ip-address
- Made necessary configuration changes in sqoop-site.xml (follow this link)
As for the oozie workflow, try the following workflow.xml
<workflow-app xmlns=”uri:oozie:workflow:0.4″ name=”oozie-wf”>
<start to=”sqoop-wf”/>
<action name=”sqoop-wf”>
<sqoop xmlns=”uri:oozie:sqoop-action:0.2″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
</configuration>
<arg>job</arg>
<arg>--exec</arg>
<arg>JOB1</arg>
<arg>--meta-connect jdbc:hsqldb:hsql://ip-address:16000/sqoop</arg>
</sqoop>
<ok to=”end”/>
<error to=”fail”/>
</action>
<kill name=”fail”>
<message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name=”end”/>
</workflow-app>
In the case of our cluster, creating jobs through Hue UI without specifying the metastore (using the --meta-connect
parameter) resulted in jobs being only sometimes returned by the sqoop job --list
command. In command prompt creating and listing jobs on the same machine worked; however, machine B could not find jobs created on machine A.
If you also can't access jobs can access jobs created on a different node, it seems like your sqoop metastore is not running. To solve it:
- start the metastore on one of the nodes with the
sqoop-metastore
command,
- check if the metastore is running:
$ ps aux | grep sqoop
should return one of the results as a Sqoop metastore,
- specify the metastore path in
sqoop-site.xml
files. In our case we couldn't make ot work though, so we did it using the next step,
- if necessary, both in shell and Hue UI, specify the metastore path in every sqoop job command (create, list, show, exec, delete) as
$ sqoop job --exec my_job --meta-connect jdbc:hsqldb:hsql://node-running-metastore:16000/sqoop
.
Note: To start the metastore, consider using &
and nohup
to run the metastore process in the background and to capture logs into a file:
$ nohup sqoop-metastore &>> /tmp/nohup-sqoop-metastore.out &