How multiple executors are managed on the worker n

2019-06-09 16:36发布

问题:

Until now, I have only used Spark on a Hadoop cluster with YARN as the resource manager. In that type of cluster, I know exactly how many executors to run and how the resource management works. However, know that I am trying to use a Standalone Spark Cluster, I have got a little bit confused. Correct me where I am wrong.

From this article, by default, a worker node uses all the memory of the node minus 1 GB. But I understand that by using SPARK_WORKER_MEMORY, we can use lesser memory. For example, if the total memory of the node is 32 GB, but I specify 16 GB, Spark worker is not going to use anymore than 16 GB on that node?

But what about executors? Let us say if I want to run 2 executors per node, can I do that by specifying executor memory during spark-submit to be half of SPARK_WORKER_MEMORY, and if I want to run 4 executors per node, by specifying executor memory to be the quarter of SPARK_WORKER_MEMORY?

If so, besides executor memory, I would also have to specify executor cores correctly, I think. For example, if I want to run 4 executors on a worker, I would have to specify executor cores to be the quarter of SPARK_WORKER_CORES? What happens, if I specify a bigger number than that? I mean if I specify executor memory to be the quarter of SPARK_WORKER_MEMORY, but executor cores to be only half of SPARK_WORKER_CORES? Would I get 2 or 4 executors running on that node in that case?

回答1:

This is the best way to control number of executors, cores and memory in my experience.

  • You can set total number of cores across all executors and number of cores per each executor, you can set executor memory

    --total-executor-cores 12 --executor-cores 2 --executor-memory 6G

    This would give you 6 executors and 2 cores/6G per each executor, so in total you are looking at 12 Cores and 36G

  • You can set driver memory using

    --driver-memory 2G



回答2:

So, I experimented with the Spark Standalone cluster myself a bit, and this is what I noticed.

  1. My intuition that muliple executors can be run inside a worker, by tuning executor cores was indeed correct. Let us say, your worker has 16 cores. Now if you specify 8 cores for executors, Spark would run 2 executors per worker.

  2. How many executors run inside a worker also depend upon the executor memory you specify. For example, if worker memory is 24 GB, and you want to run 2 executors per worker, you cannot specify executor memory to be more than 12 GB.

  3. A worker's memory can be limited when starting a slave by specifing the value for optional parameter--memory or by changing the value of SPARK_WORKER_MEMORY. Same with the number of cores (--cores/SPARK_WORKER_CORES).

If you want to be able to run multiple jobs on the Standalone Spark cluster, you could use the spark.cores.max configuration property while doing spark-submit. For example, like this.

spark-submit <other parameters> --conf="spark.cores.max=16" <other parameters>

So, if your Standalone Spark Cluster allows 64 cores in total, and you give only 16 cores to your program, other Spark jobs could use the remaining 48 cores.