I am using Hadoop2.2
. I see that my jobs are completed with success. I can browse the filesystem to find the output. However, when I browse http://NNode:8088/cluster/apps
, I am unable to see any applications that have been completed so far ( I ran 3 wordcount jobs, but none of it is seen here).
Are there any configurations that need to be taken into account?
Here is the yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>NNode</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
Here is mapred-site.xml
:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
I have job history server running too:
jps
4422 NameNode
5452 Jps
4695 SecondaryNameNode
4924 ResourceManager
72802 Jps
5369 JobHistoryServer
you can refer Hadoop is not showing my job in the job tracker even though it is running
I tested in my cluster. It works!
After applications are completed, their responsibility might be moved to
Job History Server
. So checkJob History Server
URL. It normally listen on port19888
. E.g.http://<job_history_server_address>:19888/jobhistory
Log directories and log retain durations are configurable in
yarn-site.xml
. WithYARN
, even one can aggregate logs to a single (configurable) location.Sometimes, even though application is listed, logs are not available (I am not sure if its due to some bug in
YARN
). However, almost each time I was able to get the logs using command line:Athough there are multiple options. Use help for details: