I am interested in efficiently manage the Hadoop shuffling traffic and utilize the network bandwidth effectively. To do this I want to know how much shuffling traffic generated by each Datanodes ? Shuffling traffic is nothing but the output of mappers. So where this mapper output is saved ? How can i get the size of mapper output from each datanodes in a real time ? Appreciate your help.
I have created a directory to store this mapper output as below.
<property>
<name>mapred.local.dir</name>
<value>/app/hadoop/tmp/myoutput</value>
</property>
and i looked at
hduser@dn4:/app/hadoop/tmp/myoutput$ ls -lrt
total 16
drwxr-xr-x 2 hduser hadoop 4096 Dec 12 10:50 tt_log_tmp
drwx------ 3 hduser hadoop 4096 Dec 12 10:53 ttprivate
drwxr-xr-x 3 hduser hadoop 4096 Dec 12 10:53 taskTracker
drwxr-xr-x 4 hduser hadoop 4096 Dec 12 13:25 userlogs
and i couldnot find anything here when i run the mapreduce job .
Thanks