Correct order of various phases of MR job?

2019-07-27 09:11发布

站内文章 / Hadoop

64 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to understand the various phases which a MR Job goes through. I read online documentation for the same.

Based on this, my understand on the sequence is as below:

map() -> Partitioner -> Sorting (at mapper machine) -> Shuffle -> Sorting (at reducer machine) -> groupBy(Key) (at reducer machine) -> reduce()

Is this the correct sequence in which a MR Job executes?

回答1:

Various phases of a map reduce job:

Map phase:

Reads assigned input split from HDFS
Parses input into records as key-value pairs
Applies map function to each record
Informs master node of its completion

Partition phase

Each mapper must determine which reducer will receive each of the outputs
For any key, destination partition is the same
No. of partitions = No. of reducers

Shuffle phase

Fetches input data from all map tasks for the portion corresponding to the reduce task's bucket

Sort phase

Merge sorts all map outputs into a single run

Reduce phase

Apply user defined reduce function to merged un
Argument are the key and corresponding list of values
Writes output to a file in HDFS

回答2:

Timeline of Map Reduce Job

Map Phase: several Map Tasks are executed
Reduce Phase: several Reduce Tasks are executed

Timeline for MapTask

Timeline for ReduceTask

Image source : https://www.slideshare.net/EmilioCoppa/hadoop-internals

标签： hadoop mapreduce yarn hadoop2

啃猪蹄的小仙女

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~