I'm working on a huge legacy Java application, with a lot of handwritten stuff, which nowadays you'd let a framework handle.
The problem I'm facing right now is that we are running out of file handles on our Solaris Server. I'd like to know what's the best way to track open file handles? Where to look at and what can cause open file handles to run out?
I cannot debug the application under Solaris, only on my Windows development environment. Is is even reasonable to analyze the open file handles under Windows?
Google for an app called filemon from system internals.
BTW, to track this down you may be able to use something like aspectj to log all calls that open and close files and log where they occur.
This little script help me to keep eye on count of opened files when I need test ic count. If was used on Linux, so for Solaris you should patch it (may be :) )
Also you can try to play with jvm ontion -Xverify:none - it should disable jar verification (if most of opened files is jars...). For leaks through not closed FileOutputStream you can use findbug (mentored above) or try to find article how to patch standard java FileOutputStream/FileInputStream , where you can see, who open files, and forgot close them. Unfortunatly, can not find this article right now, but this is existing :) Also think about increasing of filelimit - for up-to-date *nix kernels is not a problem handle more than 1024 fd.
One good thing I've found for tracking down unclosed file handles is FindBugs:
http://findbugs.sourceforge.net/
It checks many things, but one of the most useful is resource open/close operations. It's a static analysis program that runs on your source code and it's also available as an eclipse plugin.
I would double-check the environment settings on your Solaris box. I believe that by default Solaris only allows 256 file handles per process. For a server application, especially if it's running on a dedicated server, this is very low. Figure 50 or more descriptors for opening JRE and library JARs, and then at least one descriptor for each incoming request and database query, probably more, and you can see how this just won't cut the mustard for a serious server.
Have a look at the
/etc/system
file, for the values ofrlim_fd_cur
andrlim_fd_max
, to see what your system has set. Then consider whether this is reasonable (you can see how many file descriptors are open while the server is running with thelsof
command, ideally with the -p [process ID] parameter.I would start by asking my sysadmin to get a listing of all open file descriptors for the process. Different systems do this in different ways: Linux, for example, has the
/proc/PID/fd
directory. I recall that Solaris has a command (maybe pfiles?) that will do the same thing -- your sysadmin should know it.However, unless you see a lot of references to the same file, a fd list isn't going to help you. If it's a server process, it probably has lots of files (and sockets) open for a reason. The only way to resolve the problem is adjust the system limit on open files -- you can also check the per-user limit with ulimit, but in most current installations that equals the system limit.
This may not be practical in your case, but what I did once when I had a similar problem with open database connections was override the "open" function with my own. (Conveniently I already had this function because we had written our own connection pooling.) In my function I then added an entry to a table recording the open. I did a stack trace call and saved the identify of the caller, along with the time called and I forget what else. When the connection was released, I deleted the table entry. Then I had a screen where we could dump the list of open entries. You could then look at the time stamp and easily see which connections had been open for unlikely amounts of time, and which functions had done these opens.
From this we were able to quickly track down the couple of functions that were opening connections and failing to close them.
If you have lots of open file handles, the odds are that you're failing to close them when you're done somewhere. You say you've checked for proper try/finally blocks, but I'd suspect somewhere in the code you either missed a bad one, or you have a function that hands and never makes it to the finally. I suppose it's also possible that you really are doing proper closes every time you open a file, but you are opening hundreds of files simultaneously. If that's the case, I'm not sure what you can do other than a serious program redesign to manipulate fewer files, or a serious program redesign to queue your file accesses. (At this point I add the usual, "Without knowing the details of your application, etc.)