Yarn: How to utilize full cluster resources?

2020-07-18 06:48发布

问题:

So I am having a cloudera cluster with 7 worker nodes.

  • 30GB RAM
  • 4 vCPUs

Here are some of my configurations which I found important (from Google) in tuning performance of my cluster. I am running with:

  • yarn.nodemanager.resource.cpu-vcores => 4
  • yarn.nodemanager.resource.memory-mb => 17GB (Rest reserved for OS and other processes)
  • mapreduce.map.memory.mb => 2GB
  • mapreduce.reduce.memory.mb => 2GB
  • Running nproc => 4 (Number of processing units available)

Now my concern is, when I look at my ResourceManager, I see Available Memory as 119 GB which is fine. But when I run a heavy sqoop job and my cluster is at its peak it uses only ~59 GB of memory, leaving ~60 GB memory unused.

One way which I see, can fix this unused memory issue is increasing map|reduce.memory to 4 GB so that we can use upto 16 GB per node.

Other way is to increase the number of containers, which I am not sure how.

  • 4 cores x 7 nodes = 28 possible containers. 3 being used by other processes, only 5 are currently being available for sqoop job.

What should be the right config to improve cluster performance in this case. Can I increase the number of containers, say 2 containers per core. And is it recommended?

Any help or suggestions on the cluster configuration would be highly appreciated. Thanks.

回答1:

If your input data is in 26 splits, YARN will create 26 mappers to process those splits in parallel.

If you have 7 nodes with 2 GB mappers for 26 splits, the repartition should be something like:

  • Node1 : 4 mappers => 8 GB
  • Node2 : 4 mappers => 8 GB
  • Node3 : 4 mappers => 8 GB
  • Node4 : 4 mappers => 8 GB
  • Node5 : 4 mappers => 8 GB
  • Node6 : 3 mappers => 6 GB
  • Node7 : 3 mappers => 6 GB
  • Total : 26 mappers => 52 GB

So the total memory used in your map reduce job if all mappers are running at the same time will be 26x2=52 GB. Maybe if you add the memory user by the reducer(s) and the ApplicationMaster container, you can reach your 59 GB at some point, as you said ..

If this is the behaviour you are witnessing, and the job is finished after those 26 mappers, then there is nothing wrong. You only need around 60 GB to complete your job by spreading tasks across all your nodes without needing to wait for container slots to free themselves. The other free 60 GB are just waiting around, because you don't need them. Increasing heap size just to use all the memory won't necessarily improve performance.

Edited:

However, if you still have lots of mappers waiting to be scheduled, then maybe its because your installation insconfigured to calculate container allocation using vcores as well. This is not the default in Apache Hadoop but can be configured:

yarn.scheduler.capacity.resource-calculator : The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected.

Since you defined yarn.nodemanager.resource.cpu-vcores to 4, and since each mapper uses 1 vcore by default, you can only run 4 mappers per node at a time.

In that case you can double your value of yarn.nodemanager.resource.cpu-vcores to 8. Its just an arbitrary value it should double the number of mappers.