MapReduce shuffle/sort method

2019-01-18 12:29发布

问题:

Somewhat of an odd question, but does anyone know what kind of sort MapReduce uses in the sort portion of shuffle/sort? I would think merge or insertion (in keeping with the whole MapReduce paradigm), but I'm not sure.

回答1:

It's Quicksort, afterwards the sorted intermediate outputs get merged together. Quicksort checks the recursion depth and gives up when it is too deep. If this is the case, Heapsort is used.

Have a look at the Quicksort class:

org.apache.hadoop.util.QuickSort

You can change the algorithm used via the map.sort.class value in the hadoop-default.xml.



回答2:

To read more about it in greater depth, feel free to read about it on the post : Map-Reduce:Shuffle and sort on my blog: Hadoop: Some Salient Understandings