I am using Zeppelin 0.7.3 with Spark 2.3 in yarn-client mode. My settings are:
Spark:
spark.driver.memory 4096m
spark.driver.memoryOverhead 3072m
spark.executor.memory 4096m
spark.executor.memoryOverhead 3072m
spark.executor.cores 3
spark.executor.instances 3
Yarn:
Minimum allocation: memory:1024, vCores:2
Maximum allocation: memory:9216, vCores:6
The application started by Zeppelin gets the following resources:
Running Containers 4
Allocated CPU VCores 4
Allocated Memory MB 22528
I don't quite understand the amount of memory allocated by yarn. Given the settings, I would assume yarn would reserve (4096+3072)*4m = 28672m. However, it looks like the spark.executor.memoryOverhead option is ignored (I also tried spark.yarn.executor.memoryOverhead with no effect). Therefore, the minimum of 384m is allocated as overhead. As the minimum allocation is set to 1024m, we end up with (4096+3072)*1m + (4096+1024)*3m=22528m, where the first term is the driver and the second term sums up the executor memory.
Why are only 4 CPU VCores allocated, even though I requested more and minimum allocation is set to 2 and I requested more cores? When looking the Application Master, I find the following executors:
Here, the executors indeed have 3 cores each. How do I know which value is the correct one or what am I missing?
- I tried a couple of settings and in yarn-client mode I am supposed to use options such as spark.yarn.am.memory or spark.yarn.am.cores. However, it seems like those are ignored by yarn. Why is this the case? Additionally, in yarn-client mode, the driver is supposed to run outside of yarn. Why are the resources still allocated within yarn? My Zeppelin is running on the same machine as one of the workers.