When the Java application is hanging, you don't even know the use case that is leading to this and want to investigate, I understand that thread dumps can be useful.
But how can we easily derive useful data from the thread dumps to find where the problem is? The server application that I've been working with produces very long thread dumps, because it is an EJB architecture and thread dumps contains many container threads that I'm not sure I should be looking at (i.e. threads that are not running my application code, but JBoss's code).
Yesterday I tried the Thread Dump Analyzer tool. The tool is definitely better than looking at the raw thread dumps in a text editor, because you can filter out threads that you're not interested in, see the thread list, click on a thread to see its details, compare thread dumps to find long running threads, etc. See screenshot below:
But there's still too much data to analyse - almost 300 threads. I don't know of any criteria that I could use to filter out all the JBoss threads, in which I'm not interested. I'm not sure if I should be looking at threads that are currently in "runnable" state only or if "waiting on condition" and "in Object.wait" are also important.
What's the approach that you would normally follow and tools that you would in general use?
One set of thread dumps alone will not be too helpful to get to the root cause.
The trick is to take 4 or 5 sets of thread dumps at an interval of 5 seconds between each. so at the end you will have a single log file which has around 20 - 25 seconds worth of action on the app server.
What you want to check is when a stuck thread or long running transaction happens, all the thread dumps will show a certain thread id is at the same line in your java stack trace. In simpler terms, the transaction (say in an EJB or database) is spanning across multiple thread dumps and hence needs more investigation.
Now when you run these through Samurai (I havent used TDA myself), it will highlight these in Red colour so you can quickly click on it and get to the lines showing issues.
See an example of this here. Look at the Samurai output image in that link. The Green cells are fine. Red and Grey cells need looking at.
A Samurai example from my own web app below shows a stuck sequence for Thread'19' across a span of 5 - 10 seconds
> Thread dump 2/3 "[ACTIVE] ExecuteThread: '19' for queue:
> 'weblogic.kernel.Default
> (self-tuning)'" daemon prio=7
> tid=07b06000 nid=108 lwp_id=222813
> waiting for monitor entry
> [2aa40000..2aa40b30]
> java.lang.Thread.State: BLOCKED (on
> object monitor) at
> com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
> - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
> at
> com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
...
> Thread dump 3/3 "[ACTIVE]
> ExecuteThread: '19' for queue:
> 'weblogic.kernel.Default
> (self-tuning)'" daemon prio=7
> tid=07b06000 nid=108 lwp_id=222813
> waiting for monitor entry
> [2aa40000..2aa40b30]
> java.lang.Thread.State: BLOCKED (on
> object monitor) at
> com.bea.p13n.util.lease.JDBCLeaseManager.renewLease(JDBCLeaseManager.java:393)
> - waiting to lock <735e9f88> (a com.bea.p13n.util.lease.JDBCLeaseManager)
> at
> com.bea.p13n.util.lease.Lease$LeaseTimer.timerExpired(Lease.java:229)
update
I recently used the Java Thread Dump Analyzer mentioned in this answer and it's been very useful for Tomcat as opposed to Samurai
I know this is an old question but I just wrote a tool to help make long thread dumps more readable.
Java Thread Dump Analysis Tool
This tool groups threads together which have the same stack trace and allows you to only show threads which are in particular states (e.g. RUNNABLE or BLOCKED).
This makes it a bit quicker to find the interesting threads amongst the tens or hundreds of JBoss threads which spend most of their time waiting for work at the same place in the code and therefore all have the same stack trace.
I'm not sure if I should be looking at
threads that are currently in
"runnable" state only or if "waiting
on condition" and "in Object.wait" are
also important.
The latter two are actually the things to look for when diagnosing a deadlock, as you seem to be doing. "Runnable" means the thread is doing something right now (or waiting to get the CPU). "blocked" and "waiting" is what deadlocks are made of.
Of course, an application container will have plenty of threads waiting legitimately. To filter out the interesting cases, look at the stack trace. If it's framework classes (and especially ones called "Worker" or "Queue") it's probably OK. If it's application code, you should look at it more closely.