I'm trying to run a simple Spark app in Standalone mode on Mac.
I manage to run ./sbin/start-master.sh
to start the master server and worker.
./bin/spark-shell --master spark://MacBook-Pro.local:7077
also works and I can see it in running application list in Master WebUI
Now I'm trying to write a simple spark app.
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext._
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Simple Application")
.setMaster("spark://MacBook-Pro.local:7077")
val sc = new SparkContext(conf)
val logFile = "README.md"
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
Running this simple app gives me error message that Master is unresponsive
15/02/15 09:47:47 INFO AppClient$ClientActor: Connecting to master spark://MacBook-Pro.local:7077...
15/02/15 09:47:48 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@MacBook-Pro.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/02/15 09:48:07 INFO AppClient$ClientActor: Connecting to master spark://MacBook-Pro.local:7077...
15/02/15 09:48:07 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@MacBook-Pro.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/02/15 09:48:27 INFO AppClient$ClientActor: Connecting to master spark://MacBook-Pro.local:7077...
15/02/15 09:48:27 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@MacBook-Pro.local:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/02/15 09:48:47 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/02/15 09:48:47 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/02/15 09:48:47 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
Any idea what is the problem?
Thanks
You can either set the master when calling spark-submit
, or (as you've done here) by explicitly setting it via the SparkConf
. Try following the example in the Spark Configuration docs, and setting the master as follows:
val conf = new SparkConf().setMaster("local[2]")
From the same page (explaining the number in brackets that follows local
): "Note that we run with local[2], meaning two threads - which represents “minimal” parallelism, which can help detect bugs that only exist when we run in a distributed context."
I got the same issue and solve it finally. In my case, because I wrote the source code based on scala 2.11. But for spark, I build it with Maven following the default command:
build/mvn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
According to this build script, it will set the version of scala to version 2.10. Due to the different scala version between Spark Client and Master, it will raise incompatible serialization when client send message to master via remote actor. Finally "All masters are unresponsive" error message was shown in the console.
My Solution:
1. Re-build spark for scala 2.11 (Make sure your programming env to scala 2.11). Please run this command as below in SPARK_HOME:
dev/change-version-to-2.11.sh
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
After building, the package will be located in SPARK_HOME/assembly/target/scala-2.11. If you start your spark server using start-all.sh, it will report the target package can't found.
Go to conf folder, edit spark-env.sh file. Append the code line as below:
export SPARK_SCALA_VERSION="2.11"
Please run start-all.sh, and set the correct master url in your program, and run it. It done!
Notice: The error message in the console is not enough. So that you need toggle your log feature on to inspect what happen. You can go to conf folder, and copy log4j.properties.template to log4j.properties. After the spark master was started, the log files will save on SPARK_HOME/logs folder.
I write my code in JAVA, but I got the same problem with you. Because my scala version is 2.10, my dependencies is 2.11. Then I changed spark-core_2.11 and spark-sql_2.11 to spark-core_2.10 and spark-sql_2.10 in pom.xml. Maybe you can solve your issue in similar way.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>${spark.version}</version>
</dependency>