In our production environment, weblogic server hangs for half an hour, It looks like it has dead-locked threads. But after investigating thread dumps, 3 threads are blocked for the same lock, but no other threads own this lock.. Here is the stacktrace..
Do you have any reasonable explanation for this stiuation?.
Here is the blocked threads;
"pool-1013-thread-5" prio=7 tid=600000000842be00 nid=17280 lwp_id=518677 waiting for monitor entry [9fffffffe6aff000..9fffffffe6b00bd0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:201) - waiting to lock <9ffffffde1e7ec88> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:173) at org.hibernate.jdbc.AbstractBatcher.logOpenResults(AbstractBatcher.java:426) at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:210) at org.hibernate.loader.Loader.getResultSet(Loader.java:1808)
"pool-1013-thread-4" prio=7 tid=6000000008413400 nid=17279 lwp_id=518676 waiting for monitor entry [9fffffffe6eff000..9fffffffe6f00b50] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:201) - waiting to lock <9ffffffde1e7ec88> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:173) at org.hibernate.loader.Loader.getRow(Loader.java:1197) at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:603) at org.hibernate.loader.Loader.doQuery(Loader.java:724) at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:259) at org.hibernate.loader.Loader.loadEntity(Loader.java:1881) at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:71) at org.hibernate.loader.entity.AbstractEntityLoader.load(AbstractEntityLoader.java:65) at org.hibernate.persister.entity.AbstractEntityPersister.load(AbstractEntityPersister.java:3072) at org.hibernate.event.def.DefaultLoadEventListener.loadFromDatasource(DefaultLoadEventListener.java:434) at org.hibernate.event.def.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:415) at org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:165) at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:223) at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:126) at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:905)
"pool-1013-thread-3" prio=7 tid=6000000008411c00 nid=17278 lwp_id=518675 waiting for monitor entry [9fffffffe70ff000..9fffffffe7100ad0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:201) - waiting to lock <9ffffffde1e7ec88> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:173) at org.hibernate.jdbc.AbstractBatcher.logOpenResults(AbstractBatcher.java:426) at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:210) at org.hibernate.loader.Loader.getResultSet(Loader.java:1808) at org.hibernate.loader.Loader.doQuery(Loader.java:697)
I encountered a very similar problem. I believe this is related to https://issues.apache.org/bugzilla/show_bug.cgi?id=50614
In your case I'd suggest using jstack -l , and then parsing out the actual locked objects from the stack (their addresses differ from monitor addresses) - so for every "waiting to lock X" , you should be able to find "- locked X" in the stack of some thread (when printed using jstack -l).
In my case I found that the culprit thread (holding the lock) was actually blocked on write flush of the Console appender (and remained so even though stack trace says it is in runnable state). I couldnt find the deadlock loop either.
So at least one way of reaching this problem involves an internal lock within the OS - which is triggered through a console write flush. Therefore a simple fix is to eliminate console log appender (or increase its level).
At first I suspected a
Lock
object not being unlocked in finally. But then I had a look at two thread dumps you provided.Indeed 3 threads are blocked on the same lock, namely it is Log4J, Category class. According to thread dump, this is the code they are all blocked on:
TDA confirms that no other thread has this lock but also provides very useful tip:
And further:
Conclusion: you should enable garbage collection logging and see whether this is not the root cause. Also check if you or some library is not doing something weird with Log4J (circular categories?) - just a wild guess. Helpful options:
This bit me when I was using the log4j SocketAppender to write over a socket to logstash.
SocketAppender is blocking, and when there is an event sent to logstash on the other side and logstash can't handle it (i.e blocking on an elasticsearch call), then your app will block.
Any other threads which try to log using that appender will simply sit and wait in that blocked state pasted in the thread dump.
The best way around this is to use async appender.
See here for more info: How SocketAppender works