I have a small cluster with 3 machines, and another machine for developing and testing. When developing, I set SparkContext
to local
. When everything is OK, I want to deploy the Jar file I build to every node. Basically I manually move this jar to cluster and copy to HDFS which shared by the cluster. Then I could change the code to:
//standalone mode
val sc = new SparkContext(
"spark://mymaster:7077",
"Simple App",
"/opt/spark-0.9.1-bin-cdh4", //spark home
List("hdfs://namenode:8020/runnableJars/SimplyApp.jar") //jar location
)
to run it in my IDE. My question: Is there any way easier to move this jar to cluster?
In Spark, the program creating the SparkContext is called 'the driver'. It's sufficient that the jar file with your job is available to the local file system of the driver in order for it to pick it up and ship it to the master/workers.
In concrete, your config will look like:
Under the hood, the driver will start a server where the workers will download the jar file(s) from the driver. It's therefore important (and often an issue) that the workers have network access to the driver. This can often be ensured by setting 'spark.local.ip' on the driver in a network that's accessible/routable from the workers.