spark2 + yarn - nullpointerexception while prepari

2019-07-09 01:44发布

问题:

I'm trying to run

pyspark --master yarn
  • Spark version: 2.0.0
  • Hadoop version: 2.7.2
  • Hadoop yarn web interface is successfully started

This is what happens:

16/08/15 10:00:12 DEBUG Client: Using the default MR application classpath: $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
16/08/15 10:00:12 INFO Client: Preparing resources for our AM container
16/08/15 10:00:12 DEBUG Client: 
16/08/15 10:00:12 DEBUG DFSClient: /user/mispp/.sparkStaging/application_1471254869164_0006: masked=rwxr-xr-x
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #8
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #8
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: mkdirs took 14ms
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #9
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #9
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: setPermission took 10ms
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #10
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #10
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: getFileInfo took 2ms
16/08/15 10:00:12 INFO Client: Deleting staging directory hdfs://sm/user/mispp/.sparkStaging/application_1471254869164_0006
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp sending #11
16/08/15 10:00:12 DEBUG Client: IPC Client (1933573135) connection to sm/192.168.29.71:8020 from mispp got value #11
16/08/15 10:00:12 DEBUG ProtobufRpcEngine: Call: delete took 14ms
16/08/15 10:00:12 ERROR SparkContext: Error initializing SparkContext.
java.lang.NullPointerException
        at scala.collection.mutable.ArrayOps$ofRef$.newBuilder$extension(ArrayOps.scala:190)
        at scala.collection.mutable.ArrayOps$ofRef.newBuilder(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:246)
        at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
        at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:186)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:484)
        at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$6.apply(Client.scala:480)
        at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
        at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:480)
        at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:834)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:167)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)
16/08/15 10:00:12 DEBUG AbstractLifeCycle: stopping org.spark_project.jetty.server.Server@69e507eb
16/08/15 10:00:12 DEBUG Server: Graceful shutdown org.spark_project.jetty.server.Server@69e507eb by 

yarn-site.xml: (the last property is something i found online so just tried if it would work)

<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>sm:8025</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>sm:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>sm:8050</value>
    </property>
    <property>
        <name>yarn.application.classpath</name>
        <value>/home/mispp/hadoop-2.7.2/share/hadoop/yarn</value>
    </property>
</configuration>

.bashrc:

export HADOOP_PREFIX=/home/mispp/hadoop-2.7.2
export PATH=$PATH:$HADOOP_PREFIX/bin
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

Any idea why this happens? It's set up in 3 LXD containers (master + two computes), on a server with 16GB ram.

回答1:

Given the location of the error in the Spark 2.0.0 code:

https://github.com/apache/spark/blob/v2.0.0/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L480

I suspect that the error is happening because of a misconfiguration of spark.yarn.jars. I would double check that the value of this configuration in your setup is correct, according to the doc at http://spark.apache.org/docs/2.0.0/running-on-yarn.html#spark-properties.



回答2:

Usually (not always, apparently) when you see

ERROR SparkContext: Error initializing SparkContext.

when using Yarn, it means the Spark app couldn't start because it cannot get enough resource (again, usually, memory). So that's the first thing you need to check.

You could paste your spark-defaults.conf here. Or if you don't one note that the default value for spark.executor.memory is 1g. You could try to overwrite this value, for example,

pyspark --executor-memory 256m

to see if it starts or not.

Also, there is no resource config (e.g., yarn.nodemanager.resource.memory-mb) in your yarn-site.xml so it is likely that you're not giving Yarn enough resource to allocation. Given the size of your machine you'd better make these values explict.



回答3:

I just upped @tinfoiled answer but I like to comment here about the syntax of spark.yarn.jars(it ends with an 's') property since I spent quite some time figuring out.

The correct syntax (Which OP would know it already) is -

spark.yarn.jars=hdfs://xxx:9000/user/spark/share/lib/*.jar

Actually I didn't put *.jar in the end and it resulted into "not being able to load ApplicationMaster". I tried all sort of combinations but it didn't work. As a matter of fact I posted a question on SOF for the same problem at Property spark.yarn.jars - how to deal with it?

I wasn't even sure if what I'm doing is the right way but OP's question and @tinfoiled answer gave me some confidence and I was finally able to make use of this property.