Until now, I have only used Spark on a Hadoop cluster with YARN as the resource manager. In that type of cluster, I know exactly how many executors to run and how the resource management works. However, know that I am trying to use a Standalone Spark Cluster, I have got a little bit confused. Correct me where I am wrong.
From this article, by default, a worker node uses all the memory of the node minus 1 GB. But I understand that by using SPARK_WORKER_MEMORY
, we can use lesser memory. For example, if the total memory of the node is 32 GB, but I specify 16 GB, Spark worker is not going to use anymore than 16 GB on that node?
But what about executors? Let us say if I want to run 2 executors per node, can I do that by specifying executor memory during spark-submit
to be half of SPARK_WORKER_MEMORY
, and if I want to run 4 executors per node, by specifying executor memory to be the quarter of SPARK_WORKER_MEMORY
?
If so, besides executor memory, I would also have to specify executor cores correctly, I think. For example, if I want to run 4 executors on a worker, I would have to specify executor cores to be the quarter of SPARK_WORKER_CORES
? What happens, if I specify a bigger number than that? I mean if I specify executor memory to be the quarter of SPARK_WORKER_MEMORY
, but executor cores to be only half of SPARK_WORKER_CORES
? Would I get 2 or 4 executors running on that node in that case?
This is the best way to control number of executors, cores and memory in my experience.
You can set total number of cores across all executors and number of cores per each executor, you can set executor memory
--total-executor-cores 12 --executor-cores 2 --executor-memory 6G
This would give you 6 executors and 2 cores/6G per each executor, so in total you are looking at 12 Cores and 36G
You can set driver memory using
--driver-memory 2G
So, I experimented with the Spark Standalone cluster myself a bit, and this is what I noticed.
My intuition that muliple executors can be run inside a worker, by tuning executor cores was indeed correct. Let us say, your worker has 16 cores. Now if you specify 8 cores for executors, Spark would run 2 executors per worker.
How many executors run inside a worker also depend upon the executor memory you specify. For example, if worker memory is 24 GB, and you want to run 2 executors per worker, you cannot specify executor memory to be more than 12 GB.
A worker's memory can be limited when starting a slave by specifing the value for optional parameter
--memory
or by changing the value ofSPARK_WORKER_MEMORY
. Same with the number of cores (--cores
/SPARK_WORKER_CORES
).If you want to be able to run multiple jobs on the Standalone Spark cluster, you could use the
spark.cores.max
configuration property while doingspark-submit
. For example, like this.So, if your Standalone Spark Cluster allows 64 cores in total, and you give only 16 cores to your program, other Spark jobs could use the remaining 48 cores.