NOTE: They author is looking for answers to set the Spark Master when running Spark examples that involves no changes to the source code, but rather only options that can be done from the command-line if at all possible.
Let us consider the run() method of the BinaryClassification example:
def run(params: Params) {
val conf = new SparkConf().setAppName(s"BinaryClassification with $params")
val sc = new SparkContext(conf)
Notice that the SparkConf did not provide any means to configure the SparkMaster.
When running this program from Intellij with the following arguments:
--algorithm LR --regType L2 --regParam 1.0 data/mllib/sample_binary_classification_data.txt
the following error occurs:
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set
in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:166)
at org.apache.spark.examples.mllib.BinaryClassification$.run(BinaryClassification.scala:105)
I have also tried adding in the Spark Master url anyways (though the code seems NOT to support it ..)
spark://10.213.39.125:17088 --algorithm LR --regType L2 --regParam 1.0
data/mllib/sample_binary_classification_data.txt
and
--algorithm LR --regType L2 --regParam 1.0 spark://10.213.39.125:17088
data/mllib/sample_binary_classification_data.txt
Both do not work with error:
Error: Unknown argument 'data/mllib/sample_binary_classification_data.txt'
For reference here is the options parsing - which does nothing with SparkMaster:
val parser = new OptionParser[Params]("BinaryClassification") {
head("BinaryClassification: an example app for binary classification.")
opt[Int]("numIterations")
.text("number of iterations")
.action((x, c) => c.copy(numIterations = x))
opt[Double]("stepSize")
.text(s"initial step size, default: ${defaultParams.stepSize}")
.action((x, c) => c.copy(stepSize = x))
opt[String]("algorithm")
.text(s"algorithm (${Algorithm.values.mkString(",")}), " +
s"default: ${defaultParams.algorithm}")
.action((x, c) => c.copy(algorithm = Algorithm.withName(x)))
opt[String]("regType")
.text(s"regularization type (${RegType.values.mkString(",")}), " +
s"default: ${defaultParams.regType}")
.action((x, c) => c.copy(regType = RegType.withName(x)))
opt[Double]("regParam")
.text(s"regularization parameter, default: ${defaultParams.regParam}")
arg[String]("<input>")
.required()
.text("input paths to labeled examples in LIBSVM format")
.action((x, c) => c.copy(input = x))
So .. yes .. I could go ahead and modify the source code. But I suspect instead I am missing an available tuning knob to make this work that does not involve modifying the source code.
You can set the Spark master from the command-line by adding the JVM parameter:
If you want to get this done from code you can use
.setMaster(...)
when creating theSparkConf
:Long overdue EDIT (as per the comments)
For the session in Spark 2.x +:
Command line (2.x) assuming local standalone cluster.
So here is the solution.
Set as Local with 1 thread by default
Or with arguments (i.e. number of threads in brackets)
I downloaded Spark 1.3.0 and wanted to test the java samples using Eclipse Luna 4.4 and found out that to run the java samples you need to add spark-assembly-1.3.0-hadoop2.4.0.jar as a referenced library to your Java project.
The fastest way to start with Spark using Java is to run the JavaWordCount example. To fix above issue add following line for Spark configuration:
And that's it, try running using Eclipse you should get success. If you see below error:
just ignore, scroll the console down and you'll see your input text file line per line followed by a counter of words.
This is a fast way to get started on Spark with Windows OS without worrying to get Hadoop installed, you just need JDK 6 and Eclipse
as the document mentioned:
setMaster(String master)
The master URL to connect to, such as
local
to run locally with one thread,local[4]
to run locally with 4 cores, orspark://master:7077
to run on a Spark standalone cluster.