Hadoop gen1 vs Hadoop gen2

2020-06-06 04:41发布

I am a bit confused about place of tasktracker in Hadoop-2.x.

Daemons in Hadoop-1.x are namenode, datanode, jobtracker, taskracker and secondarynamenode

Daemons in Hadoop-2.x are namenode, datanode, resourcemanager, applicationmaster, secondarynamenode.

This means Jobtracker has split up into: resourcemanager and applicationmaster

So where is tasktracker?

9条回答
我命由我不由天
2楼-- · 2020-06-06 04:50

What I get after reading above link is

YARN handle the shortcomes of classic MR by splitting the functionality of Job tracker

functionality of JobTracker in 1.x i.e resource management and job scheduling/monitoring are divided into separate daemons. - global ResourceManager (RM) and per-application ApplicationMaster (AM)

ResourceManager - run at NameNode i.e master side

  • it DISTRIBUTE RESOURCES among all appl

    it has 2 main components: Scheduler and ApplicationsManager.

  • Scheduler is pure scheduler
  • ApplicationsManager is responsible for accepting job-submissions

NodeManager - run at DataNode i.e slave side

  • is the per-machine framework agent
  • it is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.

Central ResourceManager and Node specific Manager together is called YARN

查看更多
劳资没心,怎么记你
3楼-- · 2020-06-06 04:54

In Hadoop V2, they use YARN framework for replacing the older version. YARN has central resource manager component which manages resources and allocates the resources to the application. Multiple applications can run on Hadoop via YARN and all application could share common resource management.

http://saphanatutorial.com/how-yarn-overcomes-mapreduce-limitations-in-hadoop-2-0/

查看更多
【Aperson】
4楼-- · 2020-06-06 04:59

Yes Jobtracker was split into resource manager and application master. Application master runs on one or all node managers instances based on the number of jobs submitted. So when job submitted, resource manager talks to one of free node managers to act as application master and that application master will be now job tracker and other node managers will be task trackers which they execute Yarn child. find details here: http://ercoppa.github.io/HadoopInternals/HadoopArchitectureOverview.html

查看更多
淡お忘
5楼-- · 2020-06-06 05:04

Task tracker has been split into three components in Hadoop YARN architecture : Resource Manager, Application Manager and Application Master.

The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job or a DAG of jobs.

The ResourceManager has two main components: Scheduler and ApplicationsManager.

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application.

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.

The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.

Have a look at documentation link

Have a look at this SE question for more details.

What additional benefit does Yarn bring to the existing map reduce?

查看更多
家丑人穷心不美
6楼-- · 2020-06-06 05:05

Yes Jobtracker was split into resource manager and application master. Application master runs on one or all node managers instances based on the number of jobs submitted. So when job submitted, resource manager talks to one of free node managers to act as application master and that application master will be now job tracker and other node managers will be task trackers which they execute Yarn child. Correct me if I'm wrong.

查看更多
等我变得足够好
7楼-- · 2020-06-06 05:05
         Hadoop 1                                      Hadoop 2
1,it is mapreduce1                                  1,it is yarn mapreduce
2, here it has job tracker,                         2,here it has resource manager  
task tracker                                        ,node manager
3,it can send another task tracker                  3,it can send resource manager
                                                     ,timeline server  which  
                                                         stores applicationhistory                                                                                              
查看更多
登录 后发表回答