Somewhat of an odd question, but does anyone know what kind of sort MapReduce uses in the sort portion of shuffle/sort? I would think merge or insertion (in keeping with the whole MapReduce paradigm), but I'm not sure.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
It's Quicksort, afterwards the sorted intermediate outputs get merged together. Quicksort checks the recursion depth and gives up when it is too deep. If this is the case, Heapsort is used.
Have a look at the Quicksort class:
org.apache.hadoop.util.QuickSort
You can change the algorithm used via the map.sort.class value in the hadoop-default.xml.
回答2:
To read more about it in greater depth, feel free to read about it on the post : Map-Reduce:Shuffle and sort
on my blog: Hadoop: Some Salient Understandings