可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm working on a huge legacy Java application, with a lot of handwritten stuff, which nowadays you'd let a framework handle.
The problem I'm facing right now is that we are running out of file handles on our Solaris Server. I'd like to know what's the best way to track open file handles? Where to look at and what can cause open file handles to run out?
I cannot debug the application under Solaris, only on my Windows development environment. Is is even reasonable to analyze the open file handles under Windows?
回答1:
One good thing I've found for tracking down unclosed file handles is FindBugs:
http://findbugs.sourceforge.net/
It checks many things, but one of the most useful is resource open/close operations. It's a static analysis program that runs on your source code and it's also available as an eclipse plugin.
回答2:
On windows you can look at open file handles using process explorer:
http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
On Solaris you can use "lsof" to monitor the open file handles
回答3:
Its worth bearing in mind that open sockets also consume file handles on Unix systems. So it could very well be something like a database connection pool leak (e.g. open database connections not being closed and returned to the pool) that is leading to this issue - certainly I have seen this error before caused by a connection pool leak.
回答4:
To answer the second part of the question:
what can cause open file handles to run out?
Opening a lot of files, obviously, and then not closing them.
The simplest scenario is that the references to whatever objects hold the native handles (e.g., FileInputStream
) are thrown away before being closed, which means the files remain open until the objects are finalized.
The other option is that the objects are stored somewhere and not closed. A heap dump might be able to tell you what lingers where (jmap
and jhat
are included in the JDK, or you can use jvisualvm
if you want a GUI). You're probably interested in looking for objects owning FileDescriptor
s.
回答5:
This little script help me to keep eye on count of opened files when I need test ic count.
If was used on Linux, so for Solaris you should patch it (may be :) )
#!/bin/bash
COUNTER=0
HOW_MANY=0
MAX=0
# do not take care about COUNTER - just flag, shown should we continie or not
while [ $COUNTER -lt 10 ]; do
#run until process with passed pid alive
if [ -r "/proc/$1" ]; then
# count, how many files we have
HOW_MANY=`/usr/sbin/lsof -p $1 | wc -l`
#output for live monitoring
echo `date +%H:%M:%S` $HOW_MANY
# uncomment, if you want to save statistics
#/usr/sbin/lsof -p $1 > ~/autocount/config_lsof_`echo $HOW_MANY`_`date +%H_%M_%S`.txt
# look for max value
if [ $MAX -lt $HOW_MANY ]; then
let MAX=$HOW_MANY
echo new max is $MAX
fi
# test every second. if you don`t need so frequenlty test - increase this value
sleep 1
else
echo max count is $MAX
echo Process was finished
let COUNTER=11
fi
done
Also you can try to play with jvm ontion -Xverify:none - it should disable jar verification (if most of opened files is jars...).
For leaks through not closed FileOutputStream you can use findbug (mentored above) or try to find article how to patch standard java FileOutputStream/FileInputStream , where you can see, who open files, and forgot close them. Unfortunatly, can not find this article right now, but this is existing :)
Also think about increasing of filelimit - for up-to-date *nix kernels is not a problem handle more than 1024 fd.
回答6:
This may not be practical in your case, but what I did once when I had a similar problem with open database connections was override the "open" function with my own. (Conveniently I already had this function because we had written our own connection pooling.) In my function I then added an entry to a table recording the open. I did a stack trace call and saved the identify of the caller, along with the time called and I forget what else. When the connection was released, I deleted the table entry. Then I had a screen where we could dump the list of open entries. You could then look at the time stamp and easily see which connections had been open for unlikely amounts of time, and which functions had done these opens.
From this we were able to quickly track down the couple of functions that were opening connections and failing to close them.
If you have lots of open file handles, the odds are that you're failing to close them when you're done somewhere. You say you've checked for proper try/finally blocks, but I'd suspect somewhere in the code you either missed a bad one, or you have a function that hands and never makes it to the finally. I suppose it's also possible that you really are doing proper closes every time you open a file, but you are opening hundreds of files simultaneously. If that's the case, I'm not sure what you can do other than a serious program redesign to manipulate fewer files, or a serious program redesign to queue your file accesses. (At this point I add the usual, "Without knowing the details of your application, etc.)
回答7:
I would start by asking my sysadmin to get a listing of all open file descriptors for the process. Different systems do this in different ways: Linux, for example, has the /proc/PID/fd
directory. I recall that Solaris has a command (maybe pfiles?) that will do the same thing -- your sysadmin should know it.
However, unless you see a lot of references to the same file, a fd list isn't going to help you. If it's a server process, it probably has lots of files (and sockets) open for a reason. The only way to resolve the problem is adjust the system limit on open files -- you can also check the per-user limit with ulimit, but in most current installations that equals the system limit.
回答8:
Not a direct answer to your question but these problems could be the result of releasing file resources incorrectly in your legacy code. By example if you're working with FileOutputsStream classes make sure the close methods are called in a finally block as in this example:
FileOutputsStream out = null;
try {
//You're file handling code
} catch (IOException e) {
//Handle
} finally {
if (out != null) {
try { out.close(): } catch (IOException e) { }
}
}
回答9:
I would double-check the environment settings on your Solaris box. I believe that by default Solaris only allows 256 file handles per process. For a server application, especially if it's running on a dedicated server, this is very low. Figure 50 or more descriptors for opening JRE and library JARs, and then at least one descriptor for each incoming request and database query, probably more, and you can see how this just won't cut the mustard for a serious server.
Have a look at the /etc/system
file, for the values of rlim_fd_cur
and rlim_fd_max
, to see what your system has set. Then consider whether this is reasonable (you can see how many file descriptors are open while the server is running with the lsof
command, ideally with the -p [process ID] parameter.
回答10:
It could certainly give you an idea. Since it's Java, the file open/close mechanics should be implemented similarly (unless one of the JVMs are implemented incorrectly). I would recommend using File Monitor on Windows.
回答11:
Google for an app called filemon from system internals.
BTW, to track this down you may be able to use something like aspectj to log all calls that open and close files and log where they occur.
回答12:
This is a coding pattern that helps find unclosed resources. It closes the resources and also complains in the log about the problem.
class
{
boolean closed = false;
File file;
close() {
closed = true;
file.close();
}
finalize() {
if (!closed) {
log error "OI! YOU FORGOT TO CLOSE A FILE!"
file.close();
}
}
Wrap the above file.close() calls in try-catch blocks that ignore errors.
Also, Java 7 has a new 'try-with-resource' feature that can auto-close resources.