I have a Apache Spark 0.9.0 Cluster installed where I am trying to deploy a code which reads a file from HDFS. This piece of code throws a warning and eventually the job fails. Here is the code
/**
* running the code would fail
* with a warning
* Initial job has not accepted any resources; check your cluster UI to ensure that
* workers are registered and have sufficient memory
*/
object Main extends App {
val sconf = new SparkConf()
.setMaster("spark://labscs1:7077")
.setAppName("spark scala")
val sctx = new SparkContext(sconf)
sctx.parallelize(1 to 100).count
}
The below is the WARNING message
Initial job has not accepted any resources; check your cluster UI to
ensure that workers are registered and have sufficient memory
how to get rid of this or am I missing some configurations.
You get this when either the number of cores or amount of RAM (per node) you request via setting spark.cores.max
and spark.executor.memory
resp' exceeds what is available. Therefore even if no one else is using the cluster, and you specify you want to use, say 100GB RAM per node, but your nodes can only support 90GB, then you will get this error message.
To be fair the message is vague in this situation, it would be more helpful if it said your exceeding the maximum.
Looks like Spark master can't assign any workers for this task. Either the workers aren't started or they are all busy.
Check Spark UI on master node (port specified by SPARK_MASTER_WEBUI_PORT
in spark-env.sh
, 8080 by default). It should look like this:
For cluster to function properly:
- There must be some workers with state "Alive"
- There must be some cores available (for example, if all cores are busy with the frozen task, the cluster won't accept new tasks)
- There must be sufficient memory available
Also make sure your spark workers can communicate both ways with the driver. Check for firewalls, etc.
I had this exact issue. I had a simple 1-node Spark cluster and was getting this error when trying to run my Spark app.
I ran through some of the suggestions above and it was when I tried to run the Spark shell against the cluster and not being able to see this in the UI that I became suspicious that my cluster was not working correctly.
In my hosts file I had an entry, let's say SparkNode
, that referenced the correct IP Address.
I had inadvertently put the wrong IP Address in the conf/spark-env.sh
file against the SPARK_MASTER_IP
variable. I changed this to SparkNode
and I also changed SPARK_LOCAL_IP
to point to SparkNode
.
To test this I opened up the UI using SparkNode:7077
in the browser and I could see an instance of Spark running.
I then used Wildfires suggestion of running the Spark shell, as follows:
MASTER=spark://SparkNode:7077 bin/spark-shell
Going back to the UI I could now see the Spark shell application running, which I couldn't before.
So I exited the Spark shell and ran my app using Spark Submit and it now works correctly.
It is definitely worth checking out all of your IP and host entries, this was the root cause of my problem.
You need to specify the right SPARK_HOME and your driver program's IP address in case Spark may not able to locate your Netty jar server. Be aware that your Spark master should listen to the correct IP address which you suppose to use. This can be done by setting SPARK_MASTER_IP=yourIP in file spark-env.sh.
val conf = new SparkConf()
.setAppName("test")
.setMaster("spark://yourSparkMaster:7077")
.setSparkHome("YourSparkHomeDir")
.set("spark.driver.host", "YourIPAddr")
Check for errors regard to hostname, IP address and loopback. Make sure to set SPARK_LOCAL_IP
and SPARK_MASTER_IP
.
I had similar issue Initial job has not accepted any resource, fixed it by specify the spark correct download url on spark-env.sh or installing spark on all slaves.
export SPARK_EXECUTOR_URI=http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory