Difference between partial sort, total sort and se

2020-08-01 06:07发布

问题:

Please let me know the Difference between partial sort, total sort and secondary sort in hadoop

回答1:

Partial Sort:-

The reducer output will be lot of files each of which is sorted within itself based on the key.

Total Sort:

The reducer output will be a single file having all the output sorted based on the key.

Secondary Sort:

In this case, we will be able to control the ordering of the values along with the keys.That is sorting can be done on two or more field values.



回答2:

Partial Sort:

N number of Mappers will simply generate N number of files. N number of reducers will sort these files individually.

Total Sort

All key value pairs from a particular Key will reach a particular reducer. This will happen through Partitioners at Mapper level. Combiners at Mapper level will act as Semi reducers and send values of a particular key to Reducer.

The reducer output will be a single file having all the output sorted based on the key.

Secondary Sort

Used to define how map output keys are sorted. It works at Mapper level. In this case, we will be able to control the ordering of the values along with the keys.That is sorting can be done on two or more field values.

Have a look at article1 and article2 and article3