What does setMaster `local[*]` mean in spark?

I found some code to start spark locally with:

val conf = new SparkConf().setAppName("test").setMaster("local[*]")
val ctx = new SparkContext(conf)

What does the [*] mean?

标签： scala apache-spark

4条回答

Bombasti

2楼-- · 2020-01-24 03:00

Some additional Info

Do not run Spark Streaming programs locally with master configured as "local" or "local[ 1]". This allocates only one CPU for tasks and if a receiver is running on it, there is no resource left to process the received data. Use at least "local[ 2]" to have more cores.

From -Learning Spark: Lightning-Fast Big Data Analysis

0人赞添加讨论(0) 举报

闹够了就滚

3楼-- · 2020-01-24 03:02

Master URL Meaning

local : Run Spark locally with one worker thread (i.e. no parallelism at all).

local[K] : Run Spark locally with K worker threads (ideally, set this to the number of cores on your machine).

local[K,F] : Run Spark locally with K worker threads and F maxFailures (see spark.task.maxFailures for an explanation of this variable)

local[*] : Run Spark locally with as many worker threads as logical cores on your machine.

local[*,F] : Run Spark locally with as many worker threads as logical cores on your machine and F maxFailures.

spark://HOST:PORT : Connect to the given Spark standalone cluster master. The port must be whichever one your master is configured to use, which is 7077 by default.

spark://HOST1:PORT1,HOST2:PORT2 : Connect to the given Spark standalone cluster with standby masters with Zookeeper. The list must have all the master hosts in the high availability cluster set up with Zookeeper. The port must be whichever each master is configured to use, which is 7077 by default.

mesos://HOST:PORT : Connect to the given Mesos cluster. The port must be whichever one your is configured to use, which is 5050 by default. Or, for a Mesos cluster using ZooKeeper, use mesos://zk://.... To submit with --deploy-mode cluster, the HOST:PORT should be configured to connect to the MesosClusterDispatcher.

yarn : Connect to a YARN cluster in client or cluster mode depending on the value of --deploy-mode. The cluster location will be found based on the HADOOP_CONF_DIR or YARN_CONF_DIR variable.

https://spark.apache.org/docs/latest/submitting-applications.html

0人赞添加讨论(0) 举报

等我变得足够好

4楼-- · 2020-01-24 03:09

From the doc:

./bin/spark-shell --master local[2]

The --master option specifies the master URL for a distributed cluster, or local to run locally with one thread, or local[N] to run locally with N threads. You should start by using local for testing.

And from here:

local[*] Run Spark locally with as many worker threads as logical cores on your machine.

0人赞添加讨论(0) 举报

▲ chillily

5楼-- · 2020-01-24 03:10

Master URL

You can run Spark in local mode using local, local[n] or the most general local[*] for the master URL.

The URL says how many threads can be used in total:

local uses 1 thread only.

local[n] uses n threads.

local[*] uses as many threads as the number of processors available to the Java virtual machine (it uses Runtime.getRuntime.availableProcessors() to know the number).

local[N, maxFailures] (called local-with-retries) with N being * or the number of threads to use (as explained above) and maxFailures being the value of spark.task.maxFailures.

0人赞添加讨论(0) 举报

What does setMaster `local[*]` mean in spark?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间