java heap and thread analysis for memory leak

2019-07-07 02:57发布

问题:

My WebLogic server was configured with 16gb of heap space, but it was 90% used within 1 hour of production usage when most of the users started work. I observed there were several stuck threads whenever this happens.

I have captured the heap dump when the heap was approx 10% free. How do I inspect the heap dump to find out the memory leak, or process, codes which is causing this issue.

I have tried to understand the memory leak, running tools like JMap and Eclipse MAT, but it maybe due to lack of experience, I couldn't understand what these tools are trying to show. Or how/what should I look out for?

I have both the before/after GC heap dump to analyze.

I have reviewed the thread dumps, there were no "waiting to lock" objects threads, the threads were similar as shown below, with threads stuck with no obvious reasons.

回答1:

According to your heap dump, your biggest memory issue is the int arrays, indeed it takes nearly 70 % of your heap (Yes sort the Size Column instead).

  1. Select it in your heap dump, right click and select on Show in Instances View
  2. Then browse the biggest objects and for each of them right click and select Show Nearest GC Root to see which Object has still an hard reference to the int array which prevents to be eligible for the GC.

It could help you to find your memory leak assuming that it is a memory leak.

See below an example of Nearest GC Root allowing to identify a leak that I added intentionally to my program just to show the idea. As you can see in the screenshot, I have an array of int which cannot be eligible for the GC because it is stored in an HashMap called leak in my class Application, so I know that my memory issue could be due to this particular HashMap especially if I have many other objects which lead to this HashMap.

NB: Be patient when you try to identify a leak as it is not always obvious, the ideal situation is where you have a huge object that takes the whole heap but obviously it is not your case there is nothing really obvious that is the reason why I propose to investigate the int arrays first. Don't forget that it could also be little int arrays but thousands of them with the same Nearest GC Root.

Another trick, If you have JProfiler you can simply follow this wonderful tutorial to find your leak.

Response Update:

One simple way to better identify the root cause of the memory leak is to take at least 2 heap dumps then compare them using a tool like jhat with the syntax

jhat -J-Xmx2G -baseline ${path-to-the-first-heap-dump} ${path-to-the-second-heap-dump}

It will launch a small HTTP sever on port 7000 so:

  1. Launch http://localhost:7000/
  2. Then click on Show instance counts for all classes (including platform)

You will then see the list of Classes ordered by total amount of new instances created. You can then use VisualVM to do what I described in the first part of my answer to find the root cause of your memory leak.

You can also use jhat

  1. By selecting of the Top Classes then for each of them
  2. click on one "Reference to this Object"
  3. then click on Exclude weak refs

You will then see the GC root of each instances like the next screenshot:

Another way is to use Eclipse Memory Analyzer also called MAT.

  1. Open the second snapshot with it
  2. Select the view histogram
  3. Then for each of the Top Classes right click
  4. Choose Merge Shortest Paths To GC Roots/ Exclude All references

you will then see something like the next screenshot:



回答2:

The JDK's "jmap -histo" command will dump object counts/bytes for all classes to a text file. If you capture/compare a few of these dumps over time, you will see which ones grow continually -- your memory leak. The overhead of -histo is much lower than that of capturing a full heap dump.

Comparing just a few dumps (like the python script detailed here) seems like too small of a sample, so I wrote an open-source tool (here) that runs this jmap -histo command in the background (at an interval). It has a live display and tracks the % of time that the byte count for each class is on the rise.



回答3:

It seems you, probably, have a memory leak situation. Your best approach is to use Java Mission Control with Flight Recorder to get the class and method leaking.

You should setup your weblogic managed server with the following parameters:

-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=8999 
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false 
-XX:+UnlockCommercialFeatures 
-XX:+FlightRecorder

When you set this up, follow the instructions here to detect the leak.

Hope it helps !!



回答4:

I am one of the developers of the tool called Plumbr. Among other things we make an automatic analysis of heap contents in case of excessive memory usage. You may find it useful.



回答5:

Per your comments: you have Java 7 with 16GB heap, no GC algorithm explicitly specified, so default for Java 7 is Throughput GC, which is not suitable for most web apps, for it leads to long GC pauses for big heaps.

Switch to ConcurrentMarkSweep GC, this way GC will not wait till your memory fills up and will try its best to collect garbage incrementally, so that you will have fewer Stop The World pauses.



回答6:

Did you try yourkit profiler? It's not free, but you can evaluate it for 30 days. In this case if you dump contains all object (not only live), you will be able to check roots for them as well. Because it could be that you don't have memory leak, but too big memory footprint. Also it would be great to enable GC logs and parse how much FullGC pauses do you have:

grep "Full GC" jvm_gc.log | wc -l

In ideal world it should be 0 :)

Btw, whole this article could be helpful for you.