I want to know how to execute map reduce code on Hadoop 2.9.0 multi-node cluster? I wanna understand which node process which input. Actually, How to find every part of input data is processed by which mapper? I executed following python code on master:
import sys
import socket
for line in sys.stdin:
line = line.strip()
words = line.split()
for word in words:
print('%s\t%s\t%s' % (word, 1, socket.gethostname()))
I used socket.gethostname()
to finding hostname of nodes. I expecte output of this mapper be (e.g):
Bye 1 hadoopmaster
Goodbye 1 hadoopmaster
Hadoop 1 hadoopmaster
Hadoop 1 hadoopslave1
Hello 1 hadoopmaster
Hello 1 hadoopslave2
But is:
Bye 1 hadoopmaster
Goodbye 1 hadoopmaster
Hadoop 1 hadoopmaster
Hadoop 1 hadoopmaster
Hello 1 hadoopmaster
Hello 1 hadoopmaster
Is the code not running on the slave nodes?