Hadoop, MapReduce: how to add second node to mapRe

I have a Hadoop 0.2.2 cluster of 2 nodes. On the first machine I start:

namenode
datanode
NodeManager
ResourceManager
JobHistoryServer

On the second I start all those as well, except for namenode:

datanode
NodeManager
ResourceManager
JobHistoryServer

My mapred-site.xml on both machines contains:

<property>
  <name>mapred.job.tracker</name>
  <value>firstMachine:54311</value>
</property>

My core-site.xml on both machines contains:

<property>
   <name>fs.default.name</name>
   <value>hdfs://firstMachine:9000</value>
</property>

The console at http://firstMachine:50070 reports 2 nodes:

 Live Nodes     :   2 (Decommissioned: 0)

However the console at http://firstMachine:8088 (the one with the map reduce jobs history and all that), keeps saying:

Active Nodes: 1

Also, executing a map reduce with or without the second machine, Hadoop yields pretty much the same performance. Tried it with the wordcout example, using 4 big files.

My question is: how can I check if my map reduce is actually executed on multiple (2 in this case) machines, and not just the one where it is launched?

If my Hadoop map reduce in fact does NOT see the other Hadoop instance, how to I make it see it (how can I configure it to run the map reduce on 2 machines) ?

OK, I've found the answer. Apparently in version 2.2 most (all?) of the stuff that was related to mapred is now moved to yarn. So instead of using the mapred-site.xml file, I had to use the yarn-site.xml file, and add to it :

<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>firstMachine</value>
</property>

(note that I didn't have to add the port, only the host is to be declared here. The port will be used in its default value).

Now the console displays 2 active nodes, and the map/reduce job is about 20% faster.

Hadoop, MapReduce: how to add second node to mapRe

问题:

回答1:

收藏的人(0)

Hadoop, MapReduce: how to add second node to mapRe

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮