I did my work, read the documentation at https://spark.apache.org/docs/latest/configuration.html
in spark-folder/conf/spark-env.sh:
- SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
- SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
- SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
what is the relationship of above 3 parameters?
As I understand, DRIVER_MEMORY is the max memory master node/process can request. But for driver, how about multiple machine situation, eg. 1 master machine and 2 worker machine, worker machine should also have some memory available for spark driver?
EXECUTOR_MEMORY and WORKER_MEMORY are the same to me, just different names, could this also be explained please?
Thank you very much.
First, you should know that 1 Worker (you can say 1 machine or 1 Worker Node) can launch multiple Executors (or multiple Worker Instances - the term they use in the docs).
SPARK_WORKER_MEMORY
is only used in standalone deploy modeSPARK_EXECUTOR_MEMORY
is used in YARN deploy modeIn Standalone mode, you set
SPARK_WORKER_MEMORY
to the total amount of memory can be used on one machine (All Executors on this machine) to run your spark applications.In contrast, In YARN mode, you set
SPARK_DRIVER_MEMORY
to the memory of one ExecutorSPARK_DRIVER_MEMORY
is used in YARN deploy mode, specifying the memory for the Driver that runs your application & communicates with Cluster Manager.