How to set Master address for Spark examples from

NOTE: They author is looking for answers to set the Spark Master when running Spark examples that involves no changes to the source code, but rather only options that can be done from the command-line if at all possible.

Let us consider the run() method of the BinaryClassification example:

  def run(params: Params) {
    val conf = new SparkConf().setAppName(s"BinaryClassification with $params")
    val sc = new SparkContext(conf)

Notice that the SparkConf did not provide any means to configure the SparkMaster.

When running this program from Intellij with the following arguments:

--algorithm LR --regType L2 --regParam 1.0 data/mllib/sample_binary_classification_data.txt

the following error occurs:

Exception in thread "main" org.apache.spark.SparkException: A master URL must be set
in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:166)
    at org.apache.spark.examples.mllib.BinaryClassification$.run(BinaryClassification.scala:105)

I have also tried adding in the Spark Master url anyways (though the code seems NOT to support it ..)

  spark://10.213.39.125:17088   --algorithm LR --regType L2 --regParam 1.0 
  data/mllib/sample_binary_classification_data.txt

and

--algorithm LR --regType L2 --regParam 1.0 spark://10.213.39.125:17088
data/mllib/sample_binary_classification_data.txt

Both do not work with error:

Error: Unknown argument 'data/mllib/sample_binary_classification_data.txt'

For reference here is the options parsing - which does nothing with SparkMaster:

val parser = new OptionParser[Params]("BinaryClassification") {
  head("BinaryClassification: an example app for binary classification.")
  opt[Int]("numIterations")
    .text("number of iterations")
    .action((x, c) => c.copy(numIterations = x))
  opt[Double]("stepSize")
    .text(s"initial step size, default: ${defaultParams.stepSize}")
    .action((x, c) => c.copy(stepSize = x))
  opt[String]("algorithm")
    .text(s"algorithm (${Algorithm.values.mkString(",")}), " +
    s"default: ${defaultParams.algorithm}")
    .action((x, c) => c.copy(algorithm = Algorithm.withName(x)))
  opt[String]("regType")
    .text(s"regularization type (${RegType.values.mkString(",")}), " +
    s"default: ${defaultParams.regType}")
    .action((x, c) => c.copy(regType = RegType.withName(x)))
  opt[Double]("regParam")
    .text(s"regularization parameter, default: ${defaultParams.regParam}")
  arg[String]("<input>")
    .required()
    .text("input paths to labeled examples in LIBSVM format")
    .action((x, c) => c.copy(input = x))

So .. yes .. I could go ahead and modify the source code. But I suspect instead I am missing an available tuning knob to make this work that does not involve modifying the source code.

标签： intellij-idea apache-spark

5条回答

劳资没心，怎么记你

2楼-- · 2019-01-22 02:30

You can set the Spark master from the command-line by adding the JVM parameter:

-Dspark.master=spark://myhost:7077

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2019-01-22 02:30

If you want to get this done from code you can use .setMaster(...) when creating the SparkConf:

val conf = new SparkConf().setAppName("Simple Application")
                          .setMaster("spark://myhost:7077")

Long overdue EDIT (as per the comments)

For the session in Spark 2.x +:

val spark = SparkSession.builder()
                        .appName("app_name")
                        .getOrCreate()

Command line (2.x) assuming local standalone cluster.

spark-shell --master spark://localhost:7077

0人赞添加讨论(0) 举报

Ridiculous、

4楼-- · 2019-01-22 02:30

So here is the solution.

Set as Local with 1 thread by default

new SparkConf().setAppName("Ravi Macha").setMaster("local")

Or with arguments (i.e. number of threads in brackets)

new SparkConf().setAppName("Ravi Macha").setMaster("local[2]")

0人赞添加讨论(0) 举报

劳资没心，怎么记你

5楼-- · 2019-01-22 02:44

I downloaded Spark 1.3.0 and wanted to test the java samples using Eclipse Luna 4.4 and found out that to run the java samples you need to add spark-assembly-1.3.0-hadoop2.4.0.jar as a referenced library to your Java project.

The fastest way to start with Spark using Java is to run the JavaWordCount example. To fix above issue add following line for Spark configuration:

SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("local[2]").set("spark.executor.memory","1g");

And that's it, try running using Eclipse you should get success. If you see below error:

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)

just ignore, scroll the console down and you'll see your input text file line per line followed by a counter of words.

This is a fast way to get started on Spark with Windows OS without worrying to get Hadoop installed, you just need JDK 6 and Eclipse

0人赞添加讨论(0) 举报

smile是对你的礼貌

6楼-- · 2019-01-22 02:50

as the document mentioned: setMaster(String master)

The master URL to connect to, such as local to run locally with one thread, local[4] to run locally with 4 cores, or spark://master:7077 to run on a Spark standalone cluster.

0人赞添加讨论(0) 举报

How to set Master address for Spark examples from

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间