I have run some Spark applications on a YARN cluster. The application shows up in the "All applications" page in the YARN UI http://host:8088/cluster but the yarn application -list
command doesnt give any results. What could be the cause of this ?
相关问题
- How to maintain order of key-value in DataFrame sa
- Spark on Yarn Container Failure
- In Spark Streaming how to process old data and del
- Filter from Cassandra table by RDD values
- Spark 2.1 cannot write Vector field on CSV
相关文章
- Java写文件至HDFS失败
- Livy Server: return a dataframe as JSON?
- mapreduce count example
- SQL query Frequency Distribution matrix for produc
- How to filter rows for a specific aggregate with s
- How to name file when saveAsTextFile in spark?
- Spark save(write) parquet only one file
- Could you give me any clue Why 'Cannot call me
When you use "-list" option without "-appTypes" or "-appStates" options, it applies default filtering for "application-types" and "states" (check the highlighted section below). If none of your applications match the default filtering, then you will not get any result.
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
If you see the help for "-list", it states the following:
"List applications. Supports optional use of -appTypes to filter applications based on application type, and -appStates to filter applications based on application state".
This seems to be bit misleading.
If you don't specify "-appStates", by default it takes states as "SUBMITTED", "ACCEPTED" and "RUNNING", for filtering. Please check the code below from "listApplications()" method of "org.apache.hadoop.yarn.client.cli.ApplicationCLI.java".
As per the code above, following logic is applied:
CMD> yarn application -list
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
CMD> yarn application -list -appStates ALL
ALL Total number of applications (application-types: [] and states: [NEW, NEW_SAVING , SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):268
CMD> yarn application -list -appStates FINISHED
Total number of applications (application-types: [] and states: [FINISHED]):136
It turns out that I had enabled Log aggregation in YARN but had set the yarn.nodemanager.remote-app-log-dir to a custom hdfs directory (/tmp/yarnlogs), So logs were actually getting aggregated at /tmp/yarnlogs in HDFS, but the yarn command was still searching for logs at the default location on HDFS (/tmp/logs). So changing the property to its default value fixed it for me.
NOTE: If the log aggregation directory is misconfigured ,it also causes an a error while trying to access job history from the web UI, that looks like :
Log aggregation has not completed or is not enabled