OOZIE: properties defined in file referenced in gl

2019-01-09 18:31发布

I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.

My objective was however, to define some global properties in file referenced in job-xml tag in global section.

After long fight and reading many articles I still cannot make it work. I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.

Hopefully, you can give me a hint.

In short:

  1. I have properties, dbserver, dbuser and dbpassword defined in /user/dm/conf/environment.xml
  2. These properties are referenced in my /user/dm/jobs/sqoop-test/workflow.xml
  3. At runtime, I receive an EL_ERROR saying that dbserver variable cannot be resolved

Here are details:

I'm using Cloudera 5.7.1 distribution installed on single node.

environment.xml file was uploaded into hdfs into /user/dm/conf folder. Here is the content:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
            <property>
               <name>dbserver</name>
               <value>someserver</value>
            </property>
            <property>
               <name>dbuser</name>
               <value>someuser</value>
            </property>
            <property>
               <name>dbpassword</name>
               <value>somepassword</value>
            </property>    
</configuration>

workflow.xml file was uploaded into /user/dm/jobs/sqoop-test-job. Here is the content:

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
    <global>
        <job-xml>/user/dm/conf/env.xml</job-xml>
    </global>
    <start to="get-data"/>
    <action name="get-data">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>       
            <prepare>
                <delete path="${outputRootPath}"/>
            </prepare>
            <arg>import</arg>
            <arg>--connect</arg>
            <arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
            <arg>--query</arg>
            <arg>select col1 from table where $CONDITIONS</arg>
            <arg>--split-by</arg>
            <arg>main_id</arg>
            <arg>--target-dir</arg>
            <arg>${outputRootPath}/table</arg>
            <arg>-m</arg>
            <arg>1</arg>
        </sqoop>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
    </kill>
    <end name='end'/>
</workflow-app>

Now, I execute oozie workflow from command line:

sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run

Where my job-config.xml is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
        <name>nameNode</name>
        <value>namenode:8020</value>
</property>
<property>
        <name>jobTracker</name>
        <value>jobtracker:8021</value>
</property>
<property>
        <name>oozie.wf.application.path</name>
        <value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
        <name>outputRootPath</name>
        <value>/user/dm/data/sqoop-test</value>
</property>
</configuration>

1条回答
闹够了就滚
2楼-- · 2019-01-09 18:50

OK, you are making two big mistakes.

1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)

Workflow Functional Specification

  • has a section 19 about Global Configuration
  • has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
  • the XML schema specification clearly shows the <global> element

Sqoop action Extension

  • does not make any mention of Global parameters
  • has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema

In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global> thing that was introduced in Workflow schema V0.4


2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.

Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config argument, or in the <parameters> element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.

Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global> Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property> Action element or inside an XML file defined with <job-xml> element, either at global Workflow level or at local Action level.

Two things to note:

  • when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
  • properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside <job-xml> files must be literals because Oozie does not have access to them (it just passes the file content to the Hadoop Configuration constructor at run-time)

What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.

查看更多
登录 后发表回答