I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.
My objective was however, to define some global properties in file referenced in job-xml
tag in global section.
After long fight and reading many articles I still cannot make it work. I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.
Hopefully, you can give me a hint.
In short:
- I have properties,
dbserver
,dbuser
anddbpassword
defined in/user/dm/conf/environment.xml
- These properties are referenced in my
/user/dm/jobs/sqoop-test/workflow.xml
- At runtime, I receive an
EL_ERROR
saying thatdbserver
variable cannot be resolved
Here are details:
I'm using Cloudera 5.7.1 distribution installed on single node.
environment.xml
file was uploaded into hdfs into /user/dm/conf
folder.
Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dbserver</name>
<value>someserver</value>
</property>
<property>
<name>dbuser</name>
<value>someuser</value>
</property>
<property>
<name>dbpassword</name>
<value>somepassword</value>
</property>
</configuration>
workflow.xml
file was uploaded into /user/dm/jobs/sqoop-test-job
. Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
<global>
<job-xml>/user/dm/conf/env.xml</job-xml>
</global>
<start to="get-data"/>
<action name="get-data">
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputRootPath}"/>
</prepare>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
<arg>--query</arg>
<arg>select col1 from table where $CONDITIONS</arg>
<arg>--split-by</arg>
<arg>main_id</arg>
<arg>--target-dir</arg>
<arg>${outputRootPath}/table</arg>
<arg>-m</arg>
<arg>1</arg>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
</kill>
<end name='end'/>
</workflow-app>
Now, I execute oozie workflow from command line:
sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run
Where my job-config.xml is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>nameNode</name>
<value>namenode:8020</value>
</property>
<property>
<name>jobTracker</name>
<value>jobtracker:8021</value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
<name>outputRootPath</name>
<value>/user/dm/data/sqoop-test</value>
</property>
</configuration>
OK, you are making two big mistakes.
1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)
Workflow Functional Specification
<global>
elementSqoop action Extension
In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the
<global>
thing that was introduced in Workflow schema V0.42. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.
Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the
-config
argument, or in the<parameters>
element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the
-config
argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the<global>
Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the<property>
Action element or inside an XML file defined with<job-xml>
element, either at global Workflow level or at local Action level.Two things to note:
<job-xml>
files must be literals because Oozie does not have access to them (it just passes the file content to the HadoopConfiguration
constructor at run-time)What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a
<job-xml>
.But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.