Is there the possibility to run the Spark standalone cluster locally on just one machine (which is basically different from just developing jobs locally (i.e., local[*]
))?.
So far I am running 2 different VMs to build a cluster, what if I could run a standalone cluster on the very same machine, having for instance three different JVMs running?
Could something like having multiple loopback addresses do the trick?
yes you can do it, launch one master and one worker node and you are good to go
launch master
./sbin/start-master.sh
launch worker
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077 -c 1 -m 512M
run SparkPi example
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://localhost:7077 lib/spark-examples-1.2.1-hadoop2.4.0.jar
Apache Spark Standalone Mode Documentation
A small update as for the latest version (the 2.1.0), the default is to bind the master to the hostname, so when starting a worker locally use the output of hostname
:
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://`hostname`:7077 -c 1 -m 512M
And to run an example, simply run the following command:
bin/run-example SparkPi
If you can't find the ./sbin/start-master.sh
file on your machine, you can start the master also with
./bin/spark-class org.apache.spark.deploy.master.Master