Passing external yml file in my spark-job/code not

2020-01-29 18:05发布

问题:

I am using spark 2.4.1 version and java8. I am trying to load external property file while submitting my spark job using spark-submit.

As I am using below TypeSafe to load my property file.

<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.1</version>

In my spark driver class MyDriver.java I am loading the YML file as below

String ymlFilename = args[1].toString();
Optional<QueryEntities>  entities =  InputYamlProcessor.process(ymlFilename);

I have all code here including InputYamlProcessor.java

https://gist.github.com/BdLearnerr/e4c47c5f1dded951b18844b278ea3441

This is working fine in my local but when I run on cluster this gives error

Error :

Can't construct a java object for tag:yaml.org,2002:com.snp.yml.QueryEntities; exception=Class not found: com.snp.yml.QueryEntities
 in 'reader', line 1, column 1:
    entities:
    ^

        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
        at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
        at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
        at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:444)
        at com.snp.yml.InputYamlProcessor.process(InputYamlProcessor.java:62)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: com.snp.yml.QueryEntities
        at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
        ... 12 more

My spark job script is

 $SPARK_HOME/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --name MyDriver  \
    --jars "/local/jars/*.jar" \
    --files hdfs://files/application-cloud-dev.properties,hdfs://files/column_family_condition.yml \
    --class com.sp.MyDriver \
    --executor-cores 3 \
    --executor-memory 9g \
    --num-executors 5 \
    --driver-cores 2 \
    --driver-memory 4g \
    --driver-java-options -Dconfig.file=./application-cloud-dev.properties \
    --conf spark.executor.extraJavaOptions=-Dconfig.file=./application-cloud-dev.properties \
    --conf spark.driver.extraClassPath=. \
    --driver-class-path . \
     ca-datamigration-0.0.1.jar application-cloud-dev.properties column_family_condition.yml

What am I doing wrong here? How to fix this issue ? Any fix is highly thankful.

Tested :

I printed something like this inside the class , before the line where getting above... to check if the issue is really class not found.

public static void printTest() {
    QueryEntity e1 = new QueryEntity();
    e1.setTableName("tab1");
    List<QueryEntity> li = new ArrayList<QueryEntity>();
    li.add(e1);


    QueryEntities ll = new QueryEntities();
    ll.setEntitiesList(li);

    ll.getEntitiesList().stream().forEach(e -> logger.error("e1 Name :" + e.getTableName()));


    return;
}

Output :

19/09/18 04:40:33 ERROR yml.InputYamlProcessor: e1 Name :tab1
    Can't construct a java object for tag:yaml.org,2002:com.snp.helpers.QueryEntities; exception=Class not found: com.snp.helpers.QueryEntities
             in 'reader', line 1, column 1:
                entitiesList:
         at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)

What is wrong here ?

回答1:

This has got nothing to do with QueryEntities i.e. YAMLException: Class not found: com.snp.yml.QueryEntities

is YML constructor issue

Changed To

Yaml yaml = new Yaml(new  CustomClassLoaderConstructor(com.snp.helpers.QueryEntities.class.getClassLoader()));

From

/*Constructor constructor = new Constructor(com.snp.helpers.QueryEntities.class);
        Yaml yaml = new Yaml( constructor );*/