I am facing issue while executing oozie sqoop action. In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.
It fails while importing temp data to hive.
In logs I am not getting any exception.
Below is a sqoop action I am using.
<workflow-app name="testSqoopLoadWorkflow" xmlns="uri:oozie:workflow:0.4">
<credentials>
<credential name='hive_credentials' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>${HIVE_THRIFT_URL}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${KERBEROS_PRINCIPAL}</value>
</property>
</credential>
</credentials>
<start to="loadSqoopDataAction"/>
<action name="loadSqoopDataAction" cred="hive_credentials">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>/tmp/hive-oozie-site.xml</job-xml>
<configuration>
<property>
<name>oozie.hive.defaults</name>
<value>/tmp/hive-oozie-site.xml</value>
</property>
</configuration>
<command>job --meta-connect ${SQOOP_METASTORE_URL} --exec TEST_SQOOP_LOAD_JOB</command>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
Below is a sqoop Job I am using to import data.
sqoop job --meta-connect ${SQOOP_METASTORE_URL} --create TEST_SQOOP_LOAD_JOB -- import --connect '${JDBC_URL}' --table testTable -m 1 --append --check-column pkId --incremental append --hive-import --hive-table testHiveTable;
In mapred logs I am getting following exception.
72285 [main] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
Intercepting System.exit(1)
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher ends
Please suggest.
This seems like a typical Sqoop import to Hive job. So it seems like Sqoop has successfully imported data in HDFS and is failing to load that data into Hive.
Here's some background on what's happening... Oozie launches a separate job (which will execute on any node in your hadoop cluster) to run the Sqoop command. The Sqoop command starts a separate job to load data into HDFS. Then, at the end of the Sqoop job, sqoop runs a hive script to load that data into Hive.
Since this is theoretically running from any node in your Hadoop cluster, hive CLI will need to be available on each node and talk to the same metastore. The Hive Metastore will need to run in remote mode.
The most normal problem is because Sqoop cannot talk to the correct metastore. The main reasons for this are normally:
Hive metastore service is not running. It should be running in remote mode and a separate service should be started. Here's a quick way to check if its running:
hive-site.xml
does not containhive.metastore.uris
. Here's an examplehive-site.xml
withhive.metastore.uris
set:hive-site.xml
is not included in your Sqoop action (or its properties). Try adding your hive-site.xml to a<file>
element in your Sqoop action. Here's an example workflow.xml with<file>
in it:If you are using cdh then problem may be due to hive metastore jar dependency conflicts.
This seems to be a bug in Sqoop. Am not sure about the JIRA#. Hortonworks mentioned that the issue is still not resolved even in HDP 2.2 version.
@abeaamase - I want try to use your solution.
Just want to check if below solution works good for sqoop + Hive import in one single oozie job?
... ... ... /tmp/hive-site.xml#hive-site.xml ... ...