Resource Allocation with Spark and Yarn

2019-07-25 13:58发布

I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are:

Spark:

spark.driver.memory 4096m

spark.driver.memoryOverhead 3072m

spark.executor.memory 4096m

spark.executor.memoryOverhead 3072m

spark.executor.cores 3

spark.executor.instances 3

Yarn:

Minimum allocation: memory:1024, vCores:2

Maximum allocation: memory:9216, vCores:6

The application started by Zeppelin gets the following resources:

Running Containers 4

Allocated CPU VCores 4

Allocated Memory MB 22528

Yarn allocation

  1. I don't quite understand the amount of memory allocated by yarn. Given the settings, I would assume yarn would reserve (4096+3072)*4m = 28672m. However, it looks like the spark.executor.memoryOverhead option is ignored (I also tried spark.yarn.executor.memoryOverhead with no effect). Therefore, the minimum of 384m is allocated as overhead. As the minimum allocation is set to 1024m, we end up with (4096+3072)*1m + (4096+1024)*3m=22528m, where the first term is the driver and the second term sums up the executor memory.

  2. Why are only 4 CPU VCores allocated, even though I requested more and minimum allocation is set to 2 and I requested more cores? When looking the Application Master, I find the following executors:

Spark allocation

Here, the executors indeed have 3 cores each. How do I know which value is the correct one or what am I missing?

  1. I tried a couple of settings and in yarn-client mode I am supposed to use options such as spark.yarn.am.memory or spark.yarn.am.cores. However, it seems like those are ignored by yarn. Why is this the case? Additionally, in yarn-client mode, the driver is supposed to run outside of yarn. Why are the resources still allocated within yarn? My Zeppelin is running on the same machine as one of the workers.

1条回答
狗以群分
2楼-- · 2019-07-25 14:35

One spark application has three roles: driver, application-master, and executor.

  1. In client mode(one of deploy mode), driver itself do not ask resource from yarn, so we have one application-master, three executors which resource must be allocated by YARN. So I think spark will ask for (4G + 3G) * 3 for three executors, and 1G for am. So Allocated Memory will by 22GB(22528MB).

  2. As for core number, I think Spark UI give the correct answer because my experience.

查看更多
登录 后发表回答