Send log4j2 stack traces over syslog

2019-07-24 20:17发布

问题:

I am trying to log stack traces into Logstash.

The logging stack is ELK (ElasticSearch, Logstash, Kibana).

The application producing logs is a Java application, using slf4j as a logging interface, and log4j2 as the logging implementation.

The log4j2.xml declares this syslog Appender, with the RFC5424 format:

<Appenders>
  <Syslog name="RFC5424" format="RFC5424" host="localhost" port="8514"
          protocol="TCP" appName="MyApp" includeMDC="true" mdcId="mdc"
          facility="LOCAL0" enterpriseNumber="18060" newLine="true"
          messageId="Audit" id="App">
    <LoggerFields>
      <KeyValuePair key="thread" value="%t"/>
      <KeyValuePair key="priority" value="%p"/>
      <KeyValuePair key="category" value="%c"/>
      <KeyValuePair key="exception" value="%ex{full}"/>
    </LoggerFields>
  </Syslog>
</Appenders>

I log a Throwable from the Java application like so:

org.slf4j.LoggerFactory.getLogger("exception_test").error("Testing errors", new RuntimeException("Exception message"));

When an exception is logged, Logstash traces something like this to show me what it persists:

{
   "@timestamp":"2016-11-08T11:08:10.387Z",
   "port":60397,
   "@version":"1",
   "host":"127.0.0.1",
   "message":"<131>1 2016-11-08T11:08:10.386Z MyComputer.local MyApp - Audit [mdc@18060 category=\"exception_test\" exception=\"java.lang.RuntimeException: Exception message",
   "type":"syslog",
   "tags":[
      "_grokparsefailure"
   ]
}

And I confirm that Kibana displays exactly the same JSON within the _source field of one of its log entries.

There's a problem here: no stack trace is saved. And the message, "Testing errors", is lost.

The "tags":["_grokparsefailure"] is unfortunate but not related to this question.


I tried adding <ExceptionPattern/> to see if it would change anything:

<Syslog name="RFC5424" format="RFC5424" host="localhost" port="8514"
        protocol="TCP" appName="MyApp" includeMDC="true" mdcId="mdc"
        facility="LOCAL0" enterpriseNumber="18060" newLine="true"
        messageId="Audit" id="App">
  <LoggerFields>
    <KeyValuePair key="thread" value="%t"/>
    <KeyValuePair key="priority" value="%p"/>
    <KeyValuePair key="category" value="%c"/>
    <KeyValuePair key="exception" value="%ex{full}"/>
  </LoggerFields>
  <ExceptionPattern>%ex{full}</ExceptionPattern>
</Syslog>

<ExceptionPattern/> replaces the log message, and also (sadly) omits all loggerFields. But it does give me a class name and line number:

{
   "@timestamp":"2016-11-08T11:54:03.835Z",
   "port":60397,
   "@version":"1",
   "host":"127.0.0.1",
   "message":"at com.stackoverflow.LogTest.throw(LogTest.java:149)",
   "type":"syslog",
   "tags":[
      "_grokparsefailure"
   ]
}

Again: no stack trace. And again: the message, "Testing errors", is lost.


How can I use log4j2 to log stack traces into Logstash? I don't necessarily have to use the syslog appender.

Essentially the constraints are:

  • Not be locked in to any particular logging infrastructure (this is why I used syslog)
  • Multi-line stack traces need to be understood as being a single log entry. It's undesirable for "each line of the stack trace" to be "a separate log message"
  • Stack traces must be able to be subjected to filters. A typical exception of mine can have a page-long stack trace. I want to filter out frames like Spring.

回答1:

Log4j 2.5's SyslogAppender can only send stack traces over UDP.

<Syslog name="RFC5424" format="RFC5424" host="localhost" port="8514"
        protocol="UDP" appName="MyApp" includeMDC="true" mdcId="mdc"
        facility="LOCAL0" enterpriseNumber="18060" newLine="true"
        messageId="LogTest" id="App">
  <LoggerFields>
    <KeyValuePair key="thread" value="%t"/>
    <KeyValuePair key="priority" value="%p"/>
    <KeyValuePair key="category" value="%c"/>
    <KeyValuePair key="exception" value="%ex{full}"/>
  </LoggerFields>
  <ExceptionPattern>%ex{full}</ExceptionPattern>
</Syslog>

With UDP: both ExceptionPattern and LoggerFields.KeyValuePair["exception"] start working as solutions for multiline stack traces.

This is what logstash prints when I sent an exception over UDP via syslog:

{
    "@timestamp" => 2016-11-14T13:23:38.304Z,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "<131>1 2016-11-14T13:23:38.302Z BirchBox.local MyApp - LogTest [mdc@18060 category=\"com.stackoverflow.Deeply\" exception=\"java.lang.RuntimeException: Exception message\n\tat com.stackoverflow.Deeply.complain(Deeply.java:10)\n\tat com.stackoverflow.Nested.complain(Nested.java:8)\n\tat com.stackoverflow.Main.main(Main.java:20)\n\" priority=\"ERROR\" thread=\"main\"] Example error\njava.lang.RuntimeException: Exception message\n\tat com.stackoverflow.Deeply.complain(Deeply.java:10)\n\tat com.stackoverflow.Nested.complain(Nested.java:8)\n\tat com.stackoverflow.Main.main(Main.java:20)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

Inside [mdc@18060 exception=\"…\"] we get the LoggerFields.KeyValuePair["exception"] stack trace.

In addition to this: the stack trace is inserted into the logged message itself, thanks to ExceptionPattern.


For reference: this is what logstash prints when I send the exception over TCP via syslog (i.e. the same SyslogAppender as described above, but with protocol="TCP" instead):

{
    "@timestamp" => 2016-11-14T19:56:30.293Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "<131>1 2016-11-14T19:56:30.277Z BirchBox.local MyApp - Audit [mdc@18060 category=\"com.stackoverflow.Deeply\" exception=\"java.lang.RuntimeException: Exception message",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.296Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "at com.stackoverflow.Deeply.complain(Deeply.java:10)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.296Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "at com.stackoverflow.Nested.complain(Nested.java:8)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.296Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "at com.stackoverflow.Main.main(Main.java:20)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.296Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "\" priority=\"ERROR\" thread=\"main\"] Example error",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.296Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "java.lang.RuntimeException: Exception message",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.297Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "at com.stackoverflow.Deeply.complain(Deeply.java:10)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.298Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "at com.stackoverflow.Nested.complain(Nested.java:8)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.298Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "at com.stackoverflow.Main.main(Main.java:20)",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}
{
    "@timestamp" => 2016-11-14T19:56:30.299Z,
          "port" => 63179,
      "@version" => "1",
          "host" => "127.0.0.1",
       "message" => "",
          "type" => "syslog",
          "tags" => [
        [0] "_grokparsefailure"
    ]
}

It looks like TCP does actually "work", but splits the single log message into many syslog messages (for example when \n is encountered).