how is it possible, 3 threads are in blocked state

2020-07-28 16:44发布

问题:

In our production environment, weblogic server hangs for half an hour, It looks like it has dead-locked threads. But after investigating thread dumps, 3 threads are blocked for the same lock, but no other threads own this lock.. Here is the stacktrace..

Do you have any reasonable explanation for this stiuation?.

Here is the blocked threads;

"pool-1013-thread-5" prio=7 tid=600000000842be00 nid=17280 lwp_id=518677 waiting for monitor entry [9fffffffe6aff000..9fffffffe6b00bd0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:201) - waiting to lock <9ffffffde1e7ec88> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:173) at org.hibernate.jdbc.AbstractBatcher.logOpenResults(AbstractBatcher.java:426) at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:210) at org.hibernate.loader.Loader.getResultSet(Loader.java:1808)

"pool-1013-thread-4" prio=7 tid=6000000008413400 nid=17279 lwp_id=518676 waiting for monitor entry [9fffffffe6eff000..9fffffffe6f00b50] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:201) - waiting to lock <9ffffffde1e7ec88> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:173) at org.hibernate.loader.Loader.getRow(Loader.java:1197) at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:603) at org.hibernate.loader.Loader.doQuery(Loader.java:724) at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:259) at org.hibernate.loader.Loader.loadEntity(Loader.java:1881) at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:71) at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:65) at org.hibernate.persister.entity.AbstractEntityPersister.load(AbstractEntityPersister.java:3072) at org.hibernate.event.def.DefaultLoadEventListener.loadFromDatasource(DefaultLoadEventListener.java:434) at org.hibernate.event.def.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:415) at org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:165) at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:223) at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:126) at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:905)

"pool-1013-thread-3" prio=7 tid=6000000008411c00 nid=17278 lwp_id=518675 waiting for monitor entry [9fffffffe70ff000..9fffffffe7100ad0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:201) - waiting to lock <9ffffffde1e7ec88> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:173) at org.hibernate.jdbc.AbstractBatcher.logOpenResults(AbstractBatcher.java:426) at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:210) at org.hibernate.loader.Loader.getResultSet(Loader.java:1808) at org.hibernate.loader.Loader.doQuery(Loader.java:697)

回答1:

At first I suspected a Lock object not being unlocked in finally. But then I had a look at two thread dumps you provided.

Indeed 3 threads are blocked on the same lock, namely it is Log4J, Category class. According to thread dump, this is the code they are all blocked on:

for(Category c = this; c != null; c=c.parent) {
  synchronized(c) {  // LOCKED HERE
    if(c.aai != null) {
      writes += c.aai.appendLoopOnAppenders(event);
    }
    if(!c.additive) {
      break;
    }
  }
}

TDA confirms that no other thread has this lock but also provides very useful tip:

This monitor doesn't have a thread locking it. This means a VM Thread is holding it. If you see many monitors having no locking thread, this usually means, the garbage collector is running. In this case you should consider analyzing the Garbage Collector output. If the dump has many monitors with no locking thread a click on the dump node will give you additional information.

And further:

This thread dump contains monitors without a locking thread information. This means, the monitor is hold by a system thread or some external resource. You should check the monitors without locking threads for more information.

Conclusion: you should enable garbage collection logging and see whether this is not the root cause. Also check if you or some library is not doing something weird with Log4J (circular categories?) - just a wild guess. Helpful options:

-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-Xloggc:gc.log


回答2:

I encountered a very similar problem. I believe this is related to https://issues.apache.org/bugzilla/show_bug.cgi?id=50614

In your case I'd suggest using jstack -l , and then parsing out the actual locked objects from the stack (their addresses differ from monitor addresses) - so for every "waiting to lock X" , you should be able to find "- locked X" in the stack of some thread (when printed using jstack -l).

In my case I found that the culprit thread (holding the lock) was actually blocked on write flush of the Console appender (and remained so even though stack trace says it is in runnable state). I couldnt find the deadlock loop either.

So at least one way of reaching this problem involves an internal lock within the OS - which is triggered through a console write flush. Therefore a simple fix is to eliminate console log appender (or increase its level).



回答3:

This bit me when I was using the log4j SocketAppender to write over a socket to logstash.

SocketAppender is blocking, and when there is an event sent to logstash on the other side and logstash can't handle it (i.e blocking on an elasticsearch call), then your app will block.

Any other threads which try to log using that appender will simply sit and wait in that blocked state pasted in the thread dump.

The best way around this is to use async appender.

See here for more info: How SocketAppender works