Configure fluentd to properly parse and ship java

Our service runs as a docker instance. Given limitation is that the docker logging driver cannot be changed to anything different than the default json-file driver. The (scala micro)service outputs a log that looks like this

{"log":"10:30:12.375 [application-akka.actor.default-dispatcher-13] [WARN] [rulekeepr-615239361-v5mtn-7]- c.v.r.s.logic.RulekeeprLogicProvider(91) - decision making have failed unexpectedly\n","stream":"stdout","time":"2017-05-08T10:30:12.376485994Z"}
{"log":"java.lang.RuntimeException: Error extracting fields to make a lookup for a rule at P2: [failed calculating amount/amountEUR/directive: [failed getting accountInfo of companyId:3303 from deadcart: unexpected status returned: 500]]\n","stream":"stdout","time":"2017-05-08T10:30:12.376528449Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.BasicRuleService$$anonfun$lookupRule$2.apply(BasicRuleService.scala:53)\n","stream":"stdout","time":"2017-05-08T10:30:12.376537277Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.BasicRuleService$$anonfun$lookupRule$2.apply(BasicRuleService.scala:53)\n","stream":"stdout","time":"2017-05-08T10:30:12.376542826Z"}
{"log":"\u0009at scala.concurrent.Future$$anonfun$transform$1$$anonfun$apply$2.apply(Future.scala:224)\n","stream":"stdout","time":"2017-05-08T10:30:12.376548224Z"}
{"log":"Caused by: java.lang.RuntimeException: failed calculating amount/amountEUR/directive: [failed getting accountInfo of companyId:3303 from deadcart: unexpected status returned: 500]\n","stream":"stdout","time":"2017-05-08T10:30:12.376674554Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.logic.TlrComputedFields$$anonfun$calculatedFields$1.applyOrElse(AbstractComputedFields.scala:39)\n","stream":"stdout","time":"2017-05-08T10:30:12.376680922Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.logic.TlrComputedFields$$anonfun$calculatedFields$1.applyOrElse(AbstractComputedFields.scala:36)\n","stream":"stdout","time":"2017-05-08T10:30:12.376686377Z"}
{"log":"\u0009at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)\n","stream":"stdout","time":"2017-05-08T10:30:12.376691228Z"}
{"log":"\u0009... 19 common frames omitted\n","stream":"stdout","time":"2017-05-08T10:30:12.376720255Z"}
{"log":"Caused by: java.lang.RuntimeException: failed getting accountInfo of companyId:3303 from deadcart: unexpected status returned: 500\n","stream":"stdout","time":"2017-05-08T10:30:12.376724303Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.mixins.DCartHelper$$anonfun$accountInfo$1.apply(DCartHelper.scala:31)\n","stream":"stdout","time":"2017-05-08T10:30:12.376729945Z"}
{"log":"\u0009at org.assbox.rulekeepr.services.mixins.DCartHelper$$anonfun$accountInfo$1.apply(DCartHelper.scala:24)\n","stream":"stdout","time":"2017-05-08T10:30:12.376734254Z"}
{"log":"\u0009... 19 common frames omitted\n","stream":"stdout","time":"2017-05-08T10:30:12.37676087Z"}

How can I harness fluentd directives for properly combining the following log event that contains a stack trace, so it all be shipped to elastic as single message?

I have full control of the logback appender pattern used, so I can change the order of occurrence of log values to something else, and even change the appender class.

We're working with k8s and it turns out its not straight forward to change the docker logging driver so we're looking for a solution that will be able to handle the given example.

I don't care so much about extracting the loglevel, thread, logger into specific keys so I could later easily filter by them in kibana. it would be nice to have, but less important. What is important is to accurately parse the timestamp, down to the milliseconds and use it as the actual log even timestamp as it shipped to elastic.

回答1:

You can use fluent-plugin-concat.

For example with Fluentd v0.14.x,

<source>
  @type tail
  path /var/log/containers/*.log
  pos_file /var/log/fluentd-containers.log.pos
  tag kubernetes.*
  read_from_head true
  <parse>
    @type json
  </parse>
  @label @INPUT
</source>

<label @INPUT>
  <filter kubernetes.**>
    @type concat
    key log
    multiline_start_regexp ^\d{2}:\d{2}:\d{2}\.\d+
    continuous_line_regexp ^(\s+|java.lang|Caused by:)
    separator ""
    flush_interval 3s
    timeout_label @PARSE
  </filter>
  <match kubernetes.**>
    @type relabel
    @label @PARSE
  </match>
</label>

<label @PARSE>
  <filter kubernetes.**>
    @type parser
    key_name log
    inject_key_prefix log.
    <parse>
      @type multiline_grok
      grok_failure_key grokfailure
      <grok>
        pattern YOUR_GROK_PATTERN
      </grok>
    </parse>
  </filter>
  <match kubernetes.**>
    @type relabel
    @label @OUTPUT
  </match>
</label>

<label @OUTPUT>
  <match kubernetes.**>
    @type stdout
  </match>
</label>

Similar issues:

https://github.com/fluent/fluent-plugin-grok-parser/issues/36
https://github.com/fluent/fluent-plugin-grok-parser/issues/37

回答2:

You can try using the fluentd-plugin-grok-parser - but I am having the same issue - it seems that the \u0009 tab character is not being recognized and so using fluentd-plugin-detect-exceptions will not detect the multiline exceptions - at least not yet in my attempts... .