I am running my job on AWS-EMR cluster. It is a 40 nodes cluster using cr1.8xlarge instances. Each cr1.8xlarge has 240G memory and 32 cores. I can run with the following config:
--driver-memory 180g --driver-cores 26 --executor-memory 180g --executor-cores 26 --num-executors 40 --conf spark.default.parallelism=4000
or
--driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 80 --conf spark.default.parallelism=4000
Since from the job-tracker website, the number of tasks running simultaneously is mainly just the number of cores (cpu) available. So I am wondering is there any advantages or specific scenarios that we want to have more than one executor per node?
Thanks!
Yes, there are advantages of running multiple executors per node - especially on large instances like yours. I recommend that you read this blog post from Cloudera.
A snippet of the post that would be of particular interest to you: