As always, a lengthy problem description.
We are currently stress testing our product - and we now we face a strange problem. After one to two hours, heap space begins to grow, the application dies sometime later.
Profiling the application shows a very large amount of Finalizer objects, filling the heap. Well, we thought "might be the weird finalizer thread to slow" issue and reviewed for reducing the amount of objects that need to be finalized (JNA native handles in this case). Good idea anyway and reduced thousands of new objects...
The next tests showed the same pattern, only one hour later and not so steep. This time the Finalizers originated from the FileInput- and FileOutput streams that are heavily used in the testbed. All resources are closed, but the Finalizers not cleaned up anymore.
I have no idea why after 1 or 2 hours (no exceptions) the FinalizerThread seems suddenly to stop working. If we force System.runFinalization() by hand in some of our threads, the profiler shows that the finalizers are cleaned up. Resuming the test immediately causes new heap allocation for Finalizers.
The FinalizerThread is still there, asking jConsole he's WAITING.
EDIT
First, inspecting the heap with HeapAnalyzer revealed nothing new/strange. HeapAnalyzer has some nice features, but i had my difficulties at first. Im using jProfiler, which comes along with nice heap inspection tools and will stay with it.
Maybe i'm missing some killer features in HeapAnalyzer?
Second, today we set up the tests with a debug connection instead of the profiler - the system is stable for nearly 5 hours now. This seems to be a very strange combination of too much Finalizers (that have been reduced in the first review), the profiler and the VM GC strategies. As everything runs fine at the moment, no real insights...
Thanks for the input so far - maybe you stay tuned and interested (now that you may have more reason to believe that we do not talk over a simple programming fault).
I want to close this question with a summary of the current state.
The last test is now up over 60 hours without any problems. That leads us to the following summary/conclusions:
- We have a high throughput server using lots of objects that in the end implement "finalize". These objects are mostly JNA memory handles and file streams. Building the Finalizers faster than GC and finalizer thread are able to clean up, this process fails after ~3 hours. This is a well known phenomenon (-> google).
- We did some optimizations so the server got rid of nearly all the JNA Finalizers. This version was tested with jProfiler attached.
- The server died some hours later than our initial attempt...
- The profiler showed a huge amount of finalizers, this time caused mostly only by file streams. This queue was not cleaned up, even after pausing the server for some time.
- Only after manually triggering "System.runFinalization()", the queue was emptied. Resuming the server started to refill...
- This is still inexplicable. We now guess this is some profiler interaction with GC/finalization.
- To debug what could be the reason for the inactive finalizer thread we detached the profiler and attached the debugger this time.
- The system was running without noticeable defect... FinalizerThread and GC all "green".
- We resumed the test (now for the first time again without any agents besides jConsole attached) and its up and fine now for over 60 hours. So apparently the initial JNA refactoring solved the issue, only the profiling session added some indeterminism (greetings from Heisenberg).
Other strategies for managing the finalizers are for example discussed in http://cleversoft.wordpress.com/2011/05/14/out-of-memory-exception-from-finalizer-object-overflow/ (besides the not overly clever "don't use finalizers"..).
Thank's for all your input.
Difficult to give a specific answer to your dilemma but take a heap dump and run it through IBM's HeapAnalyzer. Search for "heap analyzer at: http://www.ibm.com/developerworks (direct link keeps changing). Seems highly unlikely the finalizer thread "suddenly stops working" if you are not overriding finalize.
It is possible for the Finalizer to be blocked, but I don't know how it could simply die.
If you have lots of FileInputStream and FileOutputStream finalize() methods, this indicates you are not closing your files correctly. Make sure that these stream are always closed in a finally block or use Java 7's ARM. (Automatic Resource Management)
jConsole he's WAITING.
To be WAITING it has to be waiting on an object.
Both FileInputStream and FileOutputStream have same comment in their finalize() methods:
. . .
/*
* Finalizer should not release the FileDescriptor if another
* stream is still using it. If the user directly invokes
* close() then the FileDescriptor is also released.
*/
runningFinalize.set(Boolean.TRUE);
. . .
which means your Finalizer may be waiting for stream to be released. Which means that, as Joop Eggen mentioned above, your app may be doing something bad when closing one of the streams.
My guess: it is an overriden close in your own stream (wrapper) classes. As the stream classes often are wrappers and do delegate to others, I could imagine that such a nested new A(new B(new C()))
might cause some wrong logic on close. You should look for twice closing, delegate closing. And maybe still some forgotten close (close on the wrong object?).
With a slow growing heap the Java garbage collector can run out of memory when it tries to belatedly garbage collect in a low memory situation. Try turning on the concurrent mark and sweep garbage collection with -XX:+UseConcMarkSweepGC and see if your problem goes away.