I've created Amazon SQS and SNS logback appenders using the Amazon's Java SDK. The basic appenders use the synchronous Java APIs, but I've also created asynchronous versions of both by extending the ch.qos.logback.classic.AsyncAppender
class.
Stopping the logback logger context with the async appenders does not work as expected though. When the context is stopped, all async appenders try to to flush remaining events before exiting. The problem originates from ch.qos.logback.core.AsyncAppenderBase#stop
method, which interrupts the worker thread. The interrupt is triggered while the Amazon SDK is still processing the queued events and results a com.amazonaws.AbortedException
. In my tests the AbortedException
happened while the SDK was processing a response from the API, so the actual message went through, but this might not always be the case.
Is it intended that logback interrupts the worker thread even though the workers should still process the remaining event queue? And if so, how can I work around the AbortedException
caused by the interrupt? I could override the whole stop methods and remove the interrupt, but that would require copy pasting most of the implementation.
I finally managed to figure a solution, which I guess is not optimal and far from simple, but it's working.
My first attempt was to use asynchronous versions of the AWS SDK APIs with the logback provided executor, because with internal executor, the interrupt problem could be avoided. But this didn't work out because the work queues are shared, and in this case the queue must be appender specific to allow stopping it correctly. So I needed to use own executor with each appender.
First I needed an executor for the AWS clients. The catch with the executor is that the provided thread factory must create daemon threads, otherwise it will block indefinitely if the logback's JVM shutdown hook is used.
The next issue was how to stop the appender correctly with the interrupt? This required handling interrupted exception with a retry, because the executor would otherwise skip waiting for the queue flush.
The normal logback appenders are expected to work in syncronous manner and therefore shouldn't lose logging events even without a proper shutdown hook. This is a problem with the current async AWS SDK API calls. I decided to use countdown latch to provide a blocking appender behavior.
And to handle waiting with the latch.
And then all bundled together. The following example is simplified version of the real implementation, just showing the relevant parts for this issue.
All this was required to handle the following cases properly:
The above is used in the open source project Logback extensions, which I am maintainer of.