Spark 1.3.0: Running Pi example on YARN fails

2019-05-14 07:18发布

I have Hadoop 2.6.0.2.2.0.0-2041 with Hive 0.14.0.2.2.0.0-2041 After building Spark with command:

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests package

I try to run Pi example on YARN with the following command:

export HADOOP_CONF_DIR=/etc/hadoop/conf
/var/home2/test/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--executor-memory 3G \
--num-executors 50 \
hdfs:///user/test/jars/spark-examples-1.3.0-hadoop2.4.0.jar \
1000

I get exceptions: application_1427875242006_0029 failed 2 times due to AM Container for appattempt_1427875242006_0029_000002 exited with exitCode: 1 Which in fact is Diagnostics: Exception from container-launch.(please see log below).

Application tracking url reveals the following messages:

java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all

and also:

Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster

I have Hadoop working fine on 4 nodes and completly at a loss how to make Spark work on YARN.

Should I set spark.yarn.access.namenodes Spark configuration property? Though my application does not need to access any name nodes directly, but maybe this will solve the problem?

Please advise where to look for, any ideas would be of great help, thank you!

Spark assembly has been built with Hive, including Datanucleus jars on classpath
15/04/06 10:53:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/06 10:53:42 INFO impl.TimelineClientImpl: Timeline service address: http://etl-hdp-yarn.foo.bar.com:8188/ws/v1/timeline/
15/04/06 10:53:42 INFO client.RMProxy: Connecting to ResourceManager at etl-hdp-yarn.foo.bar.com/192.168.0.16:8050
15/04/06 10:53:42 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
15/04/06 10:53:42 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (4096 MB per container)
15/04/06 10:53:42 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
15/04/06 10:53:42 INFO yarn.Client: Setting up container launch context for our AM
15/04/06 10:53:42 INFO yarn.Client: Preparing resources for our AM container
15/04/06 10:53:43 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/04/06 10:53:43 INFO yarn.Client: Uploading resource file:/var/home2/test/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.6.0.jar -> hdfs://etl-hdp-nn1.foo.bar.com:8020/user/test/.sparkStaging/application_1427875242006_0029/spark-assembly-1.3.0-hadoop2.6.0.jar
15/04/06 10:53:44 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/user/test/jars/spark-examples-1.3.0-hadoop2.4.0.jar
15/04/06 10:53:44 INFO yarn.Client: Setting up the launch environment for our AM container
15/04/06 10:53:44 INFO spark.SecurityManager: Changing view acls to: test
15/04/06 10:53:44 INFO spark.SecurityManager: Changing modify acls to: test
15/04/06 10:53:44 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(test); users with modify permissions: Set(test)
15/04/06 10:53:44 INFO yarn.Client: Submitting application 29 to ResourceManager
15/04/06 10:53:44 INFO impl.YarnClientImpl: Submitted application application_1427875242006_0029
15/04/06 10:53:45 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:45 INFO yarn.Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1428317623905
     final status: UNDEFINED
     tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0029/
     user: test
15/04/06 10:53:46 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:47 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:48 INFO yarn.Client: Application report for application_1427875242006_0029 (state: ACCEPTED)
15/04/06 10:53:49 INFO yarn.Client: Application report for application_1427875242006_0029 (state: FAILED)
15/04/06 10:53:49 INFO yarn.Client: 
     client token: N/A
     diagnostics: Application application_1427875242006_0029 failed 2 times due to AM Container for appattempt_1427875242006_0029_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://etl-hdp-yarn.foo.bar.com:8088/proxy/application_1427875242006_0029/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1427875242006_0029_02_000001
Exit code: 1
Exception message: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0029/container_1427875242006_0029_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

Stack trace: ExitCodeException exitCode=1: /mnt/hdfs01/hadoop/yarn/local/usercache/test/appcache/application_1427875242006_0029/container_1427875242006_0029_02_000001/launch_container.sh: line 27: $PWD:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1428317623905
     final status: FAILED
     tracking URL: http://etl-hdp-yarn.foo.bar.com:8088/cluster/app/application_1427875242006_0029
     user: test
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

3条回答
Fickle 薄情
2楼-- · 2019-05-14 07:30

This is a bug in the HDP - Spark Integration In your spark-defaults.conf add the following lines

spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041

This should help address the issue

查看更多
Deceive 欺骗
3楼-- · 2019-05-14 07:38

I think your hadoop classpath is not setup.

lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
查看更多
Lonely孤独者°
4楼-- · 2019-05-14 07:40

If you are using spark with hdp, then we have to do following things.

  1. Add these entries in your $SPARK_HOME/conf/spark-defaults.conf

    spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)

    spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)

  2. create java-opts file in $SPARK_HOME/conf and add the installed HDP version in that file like

-Dhdp.version=2.2.0.0-2041 (your installed HDP version)

to know hdp verion please run command hdp-select status hadoop-client in the cluster

查看更多
登录 后发表回答