Why do my application level logs disappear when ex

2019-01-20 18:48发布

问题:

I'm using oozie in CDH5 environment. I'm also using the oozie web-console. I'm not able to see any of the logs from my application. I can see hadoop logs, spark logs, etc; but I see no application specific logs.

In my application I've included src/main/resources/log4j.properties

# Root logger option
log4j.rootLogger=INFO, stdout

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

In my oozie workflow I have java-actions and spark-actions.

It is also important to note that when I run my application from the command line I do see my application level logs.

回答1:

Oozie runs each Action in a different "launcher" job -- actually a YARN job with a single mapper (see exceptions below).

Whenever you see an "external ID" in the form job_000000000_0000 then you can reach the YARN logs for application_000000_0000 (yeah, "job" is the legacy naming convention from Hadoop 1, still used by JobHistory service, but YARN has another naming convention).

Your application output is actually dumped into the YARN logs for that Oozie "launcher"

  • your StdErr is dumped as-is and can be retrieved in the "stderr" section
  • your StdOut is dumped with a prefix on each line (that prefix is used by Oozie to manage its <capture_output/> trick for Shell and Pig actions) at the end of the atrocely verbose "stdout" section
  • and nothing gets into the "syslog" section AFAIK

Bottom line:

  1. run oozie job -info ****** to get the list of Actions and the corresponding "external IDs" for your Oozie workflow execution
  2. for each job_*****_** legacy ID, run yarn logs -applicationId application_*****_** | more to skim the global YARN logs, then zoom on your specific app logs
  3. now you can try to automate that thing... have fun           B-)


Exceptions to the "launcher" Oozie job principle -- the E-mail Action / Filesystem Action are just API calls executed directly from the Oozie server process; and the MapReduce Action spawns a regular YARN job with multiple Mappers and Reducers.