Is there the possibility to run the Spark standalone cluster locally on just one machine (which is basically different from just developing jobs locally (i.e., local[*]
))?.
So far I am running 2 different VMs to build a cluster, what if I could run a standalone cluster on the very same machine, having for instance three different JVMs running?
Could something like having multiple loopback addresses do the trick?
yes you can do it, launch one master and one worker node and you are good to go
launch master
launch worker
run SparkPi example
Apache Spark Standalone Mode Documentation
A small update as for the latest version (the 2.1.0), the default is to bind the master to the hostname, so when starting a worker locally use the output of
hostname
:And to run an example, simply run the following command:
If you can't find the
./sbin/start-master.sh
file on your machine, you can start the master also with