Too many open file handles

2019-01-23 18:06发布

I'm working on a huge legacy Java application, with a lot of handwritten stuff, which nowadays you'd let a framework handle.

The problem I'm facing right now is that we are running out of file handles on our Solaris Server. I'd like to know what's the best way to track open file handles? Where to look at and what can cause open file handles to run out?

I cannot debug the application under Solaris, only on my Windows development environment. Is is even reasonable to analyze the open file handles under Windows?

12条回答
兄弟一词,经得起流年.
2楼-- · 2019-01-23 18:11

Google for an app called filemon from system internals.

BTW, to track this down you may be able to use something like aspectj to log all calls that open and close files and log where they occur.

查看更多
疯言疯语
3楼-- · 2019-01-23 18:13

This little script help me to keep eye on count of opened files when I need test ic count. If was used on Linux, so for Solaris you should patch it (may be :) )

#!/bin/bash
COUNTER=0
HOW_MANY=0
MAX=0
# do not take care about COUNTER - just flag, shown should we continie or not
while [ $COUNTER -lt 10 ]; do
    #run until process with passed pid alive
    if [ -r "/proc/$1" ]; then
        # count, how many files we have
        HOW_MANY=`/usr/sbin/lsof -p $1 | wc -l`
        #output for live monitoring
        echo `date +%H:%M:%S` $HOW_MANY
        # uncomment, if you want to save statistics
        #/usr/sbin/lsof -p $1 > ~/autocount/config_lsof_`echo $HOW_MANY`_`date +%H_%M_%S`.txt

        # look for max value
        if [ $MAX -lt $HOW_MANY ]; then
            let MAX=$HOW_MANY
            echo new max is $MAX
        fi 
        # test every second. if you don`t need so frequenlty test - increase this value
        sleep 1
    else
        echo max count is $MAX
        echo Process was finished
        let COUNTER=11
    fi
done

Also you can try to play with jvm ontion -Xverify:none - it should disable jar verification (if most of opened files is jars...). For leaks through not closed FileOutputStream you can use findbug (mentored above) or try to find article how to patch standard java FileOutputStream/FileInputStream , where you can see, who open files, and forgot close them. Unfortunatly, can not find this article right now, but this is existing :) Also think about increasing of filelimit - for up-to-date *nix kernels is not a problem handle more than 1024 fd.

查看更多
forever°为你锁心
4楼-- · 2019-01-23 18:14

One good thing I've found for tracking down unclosed file handles is FindBugs:

http://findbugs.sourceforge.net/

It checks many things, but one of the most useful is resource open/close operations. It's a static analysis program that runs on your source code and it's also available as an eclipse plugin.

查看更多
混吃等死
5楼-- · 2019-01-23 18:17

I would double-check the environment settings on your Solaris box. I believe that by default Solaris only allows 256 file handles per process. For a server application, especially if it's running on a dedicated server, this is very low. Figure 50 or more descriptors for opening JRE and library JARs, and then at least one descriptor for each incoming request and database query, probably more, and you can see how this just won't cut the mustard for a serious server.

Have a look at the /etc/system file, for the values of rlim_fd_cur and rlim_fd_max, to see what your system has set. Then consider whether this is reasonable (you can see how many file descriptors are open while the server is running with the lsof command, ideally with the -p [process ID] parameter.

查看更多
疯言疯语
6楼-- · 2019-01-23 18:18

I would start by asking my sysadmin to get a listing of all open file descriptors for the process. Different systems do this in different ways: Linux, for example, has the /proc/PID/fd directory. I recall that Solaris has a command (maybe pfiles?) that will do the same thing -- your sysadmin should know it.

However, unless you see a lot of references to the same file, a fd list isn't going to help you. If it's a server process, it probably has lots of files (and sockets) open for a reason. The only way to resolve the problem is adjust the system limit on open files -- you can also check the per-user limit with ulimit, but in most current installations that equals the system limit.

查看更多
SAY GOODBYE
7楼-- · 2019-01-23 18:21

This may not be practical in your case, but what I did once when I had a similar problem with open database connections was override the "open" function with my own. (Conveniently I already had this function because we had written our own connection pooling.) In my function I then added an entry to a table recording the open. I did a stack trace call and saved the identify of the caller, along with the time called and I forget what else. When the connection was released, I deleted the table entry. Then I had a screen where we could dump the list of open entries. You could then look at the time stamp and easily see which connections had been open for unlikely amounts of time, and which functions had done these opens.

From this we were able to quickly track down the couple of functions that were opening connections and failing to close them.

If you have lots of open file handles, the odds are that you're failing to close them when you're done somewhere. You say you've checked for proper try/finally blocks, but I'd suspect somewhere in the code you either missed a bad one, or you have a function that hands and never makes it to the finally. I suppose it's also possible that you really are doing proper closes every time you open a file, but you are opening hundreds of files simultaneously. If that's the case, I'm not sure what you can do other than a serious program redesign to manipulate fewer files, or a serious program redesign to queue your file accesses. (At this point I add the usual, "Without knowing the details of your application, etc.)

查看更多
登录 后发表回答