I have a Hadoop 0.2.2 cluster of 2 nodes. On the first machine I start:
- namenode
- datanode
- NodeManager
- ResourceManager
- JobHistoryServer
On the second I start all those as well, except for namenode:
- datanode
- NodeManager
- ResourceManager
- JobHistoryServer
My mapred-site.xml
on both machines contains:
<property>
<name>mapred.job.tracker</name>
<value>firstMachine:54311</value>
</property>
My core-site.xml on both machines contains:
<property>
<name>fs.default.name</name>
<value>hdfs://firstMachine:9000</value>
</property>
The console at http://firstMachine:50070
reports 2 nodes:
Live Nodes : 2 (Decommissioned: 0)
However the console at http://firstMachine:8088
(the one with the map reduce jobs history and all that), keeps saying:
Active Nodes: 1
Also, executing a map reduce with or without the second machine, Hadoop yields pretty much the same performance. Tried it with the wordcout example, using 4 big files.
My question is: how can I check if my map reduce is actually executed on multiple (2 in this case) machines, and not just the one where it is launched?
If my Hadoop map reduce in fact does NOT see the other Hadoop instance, how to I make it see it (how can I configure it to run the map reduce on 2 machines) ?