Socket accept - “Too many open files”

2019-01-07 04:33发布

问题:

I am working on a school project where I had to write a multi-threaded server, and now I am comparing it to apache by running some tests against it. I am using autobench to help with that, but after I run a few tests, or if I give it too high of a rate (around 600+) to make the connections, I get a "Too many open files" error.

After I am done with dealing with request, I always do a close() on the socket. I have tried to use the shutdown() function as well, but nothing seems to help. Any way around this?

回答1:

There are multiple places where Linux can have limits on the number of file descriptors you are allowed to open.

You can check the following:

cat /proc/sys/fs/file-max

That will give you the system wide limits of file descriptors.

On the shell level, this will tell you your personal limit:

ulimit -n

This can be changed in /etc/security/limits.conf - it's the nofile param.

However, if you're closing your sockets correctly, you shouldn't receive this unless you're opening a lot of simulataneous connections. It sounds like something is preventing your sockets from being closed appropriately. I would verify that they are being handled properly.



回答2:

I had similar problem. Quick solution is :

ulimit -n 4096

explanation is as follows - each server connection is a file descriptor. In CentOS, Redhat and Fedora, probably others, file user limit is 1024 - no idea why. It can be easily seen when you type: ulimit -n

Note this has no much relation to system max files (/proc/sys/fs/file-max).

In my case it was problem with Redis, so I did:

ulimit -n 4096
redis-server -c xxxx

in your case instead of redis, you need to start your server.



回答3:

TCP has a feature called "TIME_WAIT" that ensures connections are closed cleanly. It requires one end of the connection to stay listening for a while after the socket has been closed.

In a high-performance server, it's important that it's the clients who go into TIME_WAIT, not the server. Clients can afford to have a port open, whereas a busy server can rapidly run out of ports or have too many open FDs.

To achieve this, the server should never close the connection first -- it should always wait for the client to close it.



回答4:

Use lsof -u `whoami` | wc -l to find how many open files the user has



回答5:

This means that the maximum number of simultaneously open files.

Solved:

At the end of the file /etc/security/limits.conf you need to add the following lines:

* soft nofile 16384
* hard nofile 16384

In the current console from root (sudo does not work) to do:

ulimit -n 16384

Although this is optional, if it is possible to restart the server.

In /etc/nginx/nginx.conf file to register the new value worker_connections equal to 16384 divide by value worker_processes.

If not did ulimit -n 16384, need to reboot, then the problem will recede.

PS:

If after the repair is visible in the logs error accept() failed (24: Too many open files):

In the nginx configuration, propevia (for example):

worker_processes 2;

worker_rlimit_nofile 16384;

events {
  worker_connections 8192;
}


回答6:

I had this problem too. You have a file handle leak. You can debug this by printing out a list of all the open file handles (on POSIX systems):

void showFDInfo()
{
   s32 numHandles = getdtablesize();

   for ( s32 i = 0; i < numHandles; i++ )
   {
      s32 fd_flags = fcntl( i, F_GETFD ); 
      if ( fd_flags == -1 ) continue;


      showFDInfo( i );
   }
}

void showFDInfo( s32 fd )
{
   char buf[256];

   s32 fd_flags = fcntl( fd, F_GETFD ); 
   if ( fd_flags == -1 ) return;

   s32 fl_flags = fcntl( fd, F_GETFL ); 
   if ( fl_flags == -1 ) return;

   char path[256];
   sprintf( path, "/proc/self/fd/%d", fd );

   memset( &buf[0], 0, 256 );
   ssize_t s = readlink( path, &buf[0], 256 );
   if ( s == -1 )
   {
        cerr << " (" << path << "): " << "not available";
        return;
   }
   cerr << fd << " (" << buf << "): ";

   if ( fd_flags & FD_CLOEXEC )  cerr << "cloexec ";

   // file status
   if ( fl_flags & O_APPEND   )  cerr << "append ";
   if ( fl_flags & O_NONBLOCK )  cerr << "nonblock ";

   // acc mode
   if ( fl_flags & O_RDONLY   )  cerr << "read-only ";
   if ( fl_flags & O_RDWR     )  cerr << "read-write ";
   if ( fl_flags & O_WRONLY   )  cerr << "write-only ";

   if ( fl_flags & O_DSYNC    )  cerr << "dsync ";
   if ( fl_flags & O_RSYNC    )  cerr << "rsync ";
   if ( fl_flags & O_SYNC     )  cerr << "sync ";

   struct flock fl;
   fl.l_type = F_WRLCK;
   fl.l_whence = 0;
   fl.l_start = 0;
   fl.l_len = 0;
   fcntl( fd, F_GETLK, &fl );
   if ( fl.l_type != F_UNLCK )
   {
      if ( fl.l_type == F_WRLCK )
         cerr << "write-locked";
      else
         cerr << "read-locked";
      cerr << "(pid:" << fl.l_pid << ") ";
   }
}

By dumping out all the open files you will quickly figure out where your file handle leak is.

If your server spawns subprocesses. E.g. if this is a 'fork' style server, or if you are spawning other processes ( e.g. via cgi ), you have to make sure to create your file handles with "cloexec" - both for real files and also sockets.

Without cloexec, every time you fork or spawn, all open file handles are cloned in the child process.

It is also really easy to fail to close network sockets - e.g. just abandoning them when the remote party disconnects. This will leak handles like crazy.



回答7:

it can take a bit of time before a closed socket is really freed up

lsof to list open files

cat /proc/sys/fs/file-max to see if there's a system limit



回答8:

Just another information about CentOS. In this case, when using "systemctl" to launch process. You have to modify the system file ==> /usr/lib/systemd/system/processName.service .Had this line in the file :

LimitNOFILE=50000

And just reload your system conf :

systemctl daemon-reload


回答9:

When your program has more open descriptors than the open files ulimit (ulimit -a will list this), the kernel will refuse to open any more file descriptors. Make sure you don't have any file descriptor leaks - for example, by running it for a while, then stopping and seeing if any extra fds are still open when it's idle - and if it's still a problem, change the nofile ulimit for your user in /etc/security/limits.conf



回答10:

I had the same problem and I wasn't bothering to check the return values of the close() calls. When I started checking the return value, the problem mysteriously vanished.

I can only assume an optimisation glitch of the compiler (gcc in my case), is assuming that close() calls are without side effects and can be omitted if their return values aren't used.



标签: c sockets