How to set Master address for Spark examples from

2019-01-22 02:18发布

NOTE: They author is looking for answers to set the Spark Master when running Spark examples that involves no changes to the source code, but rather only options that can be done from the command-line if at all possible.

Let us consider the run() method of the BinaryClassification example:

  def run(params: Params) {
    val conf = new SparkConf().setAppName(s"BinaryClassification with $params")
    val sc = new SparkContext(conf)

Notice that the SparkConf did not provide any means to configure the SparkMaster.

When running this program from Intellij with the following arguments:

--algorithm LR --regType L2 --regParam 1.0 data/mllib/sample_binary_classification_data.txt

the following error occurs:

Exception in thread "main" org.apache.spark.SparkException: A master URL must be set
in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:166)
    at org.apache.spark.examples.mllib.BinaryClassification$.run(BinaryClassification.scala:105)

I have also tried adding in the Spark Master url anyways (though the code seems NOT to support it ..)

  spark://10.213.39.125:17088   --algorithm LR --regType L2 --regParam 1.0 
  data/mllib/sample_binary_classification_data.txt

and

--algorithm LR --regType L2 --regParam 1.0 spark://10.213.39.125:17088
data/mllib/sample_binary_classification_data.txt

Both do not work with error:

Error: Unknown argument 'data/mllib/sample_binary_classification_data.txt'

For reference here is the options parsing - which does nothing with SparkMaster:

val parser = new OptionParser[Params]("BinaryClassification") {
  head("BinaryClassification: an example app for binary classification.")
  opt[Int]("numIterations")
    .text("number of iterations")
    .action((x, c) => c.copy(numIterations = x))
  opt[Double]("stepSize")
    .text(s"initial step size, default: ${defaultParams.stepSize}")
    .action((x, c) => c.copy(stepSize = x))
  opt[String]("algorithm")
    .text(s"algorithm (${Algorithm.values.mkString(",")}), " +
    s"default: ${defaultParams.algorithm}")
    .action((x, c) => c.copy(algorithm = Algorithm.withName(x)))
  opt[String]("regType")
    .text(s"regularization type (${RegType.values.mkString(",")}), " +
    s"default: ${defaultParams.regType}")
    .action((x, c) => c.copy(regType = RegType.withName(x)))
  opt[Double]("regParam")
    .text(s"regularization parameter, default: ${defaultParams.regParam}")
  arg[String]("<input>")
    .required()
    .text("input paths to labeled examples in LIBSVM format")
    .action((x, c) => c.copy(input = x))

So .. yes .. I could go ahead and modify the source code. But I suspect instead I am missing an available tuning knob to make this work that does not involve modifying the source code.

5条回答
劳资没心,怎么记你
2楼-- · 2019-01-22 02:30

You can set the Spark master from the command-line by adding the JVM parameter:

-Dspark.master=spark://myhost:7077
查看更多
一夜七次
3楼-- · 2019-01-22 02:30

If you want to get this done from code you can use .setMaster(...) when creating the SparkConf:

val conf = new SparkConf().setAppName("Simple Application")
                          .setMaster("spark://myhost:7077")


Long overdue EDIT (as per the comments)

For the session in Spark 2.x +:

val spark = SparkSession.builder()
                        .appName("app_name")
                        .getOrCreate()

Command line (2.x) assuming local standalone cluster.

spark-shell --master spark://localhost:7077 
查看更多
Ridiculous、
4楼-- · 2019-01-22 02:30

So here is the solution.

  1. Set as Local with 1 thread by default

    new SparkConf().setAppName("Ravi Macha").setMaster("local")
    
  2. Or with arguments (i.e. number of threads in brackets)

    new SparkConf().setAppName("Ravi Macha").setMaster("local[2]") 
    
查看更多
劳资没心,怎么记你
5楼-- · 2019-01-22 02:44

I downloaded Spark 1.3.0 and wanted to test the java samples using Eclipse Luna 4.4 and found out that to run the java samples you need to add spark-assembly-1.3.0-hadoop2.4.0.jar as a referenced library to your Java project.

The fastest way to start with Spark using Java is to run the JavaWordCount example. To fix above issue add following line for Spark configuration:

SparkConf sparkConf = new SparkConf().setAppName("JavaWordCount").setMaster("local[2]").set("spark.executor.memory","1g");

And that's it, try running using Eclipse you should get success. If you see below error:

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)

just ignore, scroll the console down and you'll see your input text file line per line followed by a counter of words.

This is a fast way to get started on Spark with Windows OS without worrying to get Hadoop installed, you just need JDK 6 and Eclipse

查看更多
smile是对你的礼貌
6楼-- · 2019-01-22 02:50

as the document mentioned: setMaster(String master)

The master URL to connect to, such as local to run locally with one thread, local[4] to run locally with 4 cores, or spark://master:7077 to run on a Spark standalone cluster.

查看更多
登录 后发表回答