After running the command

spark-submit --class org.apache.spark.examples.SparkPi --proxy-user yarn --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue default ./examples/jars/spark-examples_2.11-2.3.0.jar 10000

I get this in the output and it keeps on retrying. Where am I going wrong? Am I missing some configuration?

I have created a new user for yarn and running that user.

WARN  Utils:66 - Your hostname, ukaleem-HP-EliteBook-850-G3 resolves to a loopback address: 127.0.1.1; using 10.XX.XX.XX instead (on interface enp0s31f6)
2018-06-14 16:50:41 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
Warning: Local jar /home/yarn/Documents/Scala-Examples/./examples/jars/spark-examples_2.11-2.3.0.jar does not exist, skipping.
2018-06-14 16:50:42 INFO  RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8032
2018-06-14 16:50:44 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

And in the end, it gives the exception

    Exception in thread "main" java.net.ConnectException: Call From ukaleem-HP-EliteBook-850-G3/127.0.1.1 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.GeneratedConstructorAccessor4.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy8.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy9.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:179)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:177)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:177)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 28 more
2018-06-14 17:10:53 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-06-14 17:10:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-5bddb7f3-165f-451c-8ab4-bb7729f4237c

EDIT : After adding config files to my spark/conf dir, I get this error now.

The files I added are

*core-site.xml

dfs.hosts

masters

slaves

yarn-site.xml*

And some more. What I understand is that I only need yarn-site.xml to tell spark the location of the yarn cluster. (ids, address, hostname etc).

All this time I had been thinking that even we want to submit a job on Yarn these config need to go in /etc/Hadoop dir not in Spark/conf. Whats the purpose of installing hadoop then (other than communicating)? And following this question. If the config need to go in spark/conf then HADOOP_CONF_DIR & YARN_CONF_DIR should point to etc/hadoop dir or spark/conf?

    INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/06/19 11:04:50 INFO retry.RetryInvocationHandler: Exception while invoking getClusterMetrics of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 38176ms.
java.net.ConnectException: Call From ukaleem-HP-EliteBook-850-G3/127.0.1.1 to svc-hadoop-mgnt-pre-c2-01.jamba.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy13.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterMetrics(ApplicationClientProtocolPBClientImpl.java:206)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy14.getClusterMetrics(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getYarnClusterMetrics(YarnClientImpl.java:487)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.deploy.yarn.Client$$anonfun$submitApplication$1.apply(Client.scala:155)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.deploy.yarn.Client.logInfo(Client.scala:59)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:154)
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1146)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1518)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:179)
    at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:177)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:177)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 29 more

回答1:

Assuming you have a fully distributed yarn cluster: your spark-submit script is unable to find the configuration for the yarn resourcemanager (basically the yarn master node). Ensure you have HADOOP_CONF_DIR properly set in your environment, and that it points to your cluster's configuration. Specifically your yarn-site.xml.

Edit: more detail

The hadoop package comes with both server and client software. The server software would be the many daemons that run that make up the cluster. If your workstation is acting as a client (using that term loosely, not fully related to sparks --deploy-mode), then the hadoop client software must know the network locations of the server daemons running in the cluster. If your yarn-site.xml is empty, then it is pulling it's default values from yarn-defauls.xml (which is hard-coded, I believe).

Assuming your cluster is not running in HA mode, and is a mostly default configuration, then your workstation's yarn-site.xml should contain at least an entry like the following:

<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>rm-host.yourdomain.com</value>
</property>

Obviously, replace the hostname with the hostname where your actual resource manager is running. Of course, any spark interaction with HDFS will require a properly configured hdfs-site.xml, etc.

Some cluster managing software will have something like "generate client configs" (thinking of my cloudera experience specifically), which will give you a .tar.gz with all of the config files correctly populated to access the cluster from an external workstation.

Further recommendations: If you plan to do spark on yarn a lot in this cluster, spark recommends making sure that you have the external shuffle service configured to launch with your yarn node managers. (Please bear in mind, this config directive would have to be present in the yarn-site.xml where yarn's node manager services are running, not on your workstation.

回答2:

If you are running this on your local machine,

Update your /etc/hosts file, Enter 127.0.0.1 against your hostname.