After sending a lot, my send() call causes my prog

2019-04-12 05:14发布

问题:

So basically I'm making an MMO server in C++ that runs on linux. It works fine at first, but after maybe 40 seconds with 50 clients it will completely pause. When I debug it I find that basically the last frame its on before it stops responding is syscall() at which point it disappears into the kernel. Once it disappears into the kernel it never even returns a value... it's completely baffling.

The 50 clients are each sending 23 bytes every 250 milliseconds. These 23 bytes are then broadcasted to all the other 49 clients. This process begins to slow down and then eventually comes to a complete halt where the kernel never returns from a syscall for the send() command. What are some possible reasons here? This is truly driving me nuts!

One option I found is Nagles algorithm which forces delays. I've tried toggling it but it still happens however.

Edit: The program is stuck here. Specifically, in the send, which in turn calls syscall()

bool EpollManager::s_send(int curFD, unsigned char buf[], int bufLen, int flag) 
//     Meant to counteract partial sends
{
    int sendRetVal = 0;
    int bytesSent = 0;
    while(bytesSent != bufLen)
    {
 print_buffer(buf, bufLen);
        sendRetVal = send(curFD, buf + bytesSent, bufLen - bytesSent, flag); 

        cout << sendRetVal << " ";
        if(sendRetVal == -1)
        {
            perror("Sending failed");
            return false;
        }
        else
            bytesSent += sendRetVal;
    }
    return true;
}

Also this is the method which calls the s_send.

    void EpollManager::broadcast(unsigned char msg[], int bytesRead, int sender)
    {
 for(iMap = connections.begin(); iMap != connections.end(); iMap++)
 {
  if(sender != iMap->first)
  {
   if(s_send(iMap->first, msg, bytesRead, 0)) // MSG_NOSIGNAL
   {
       if(debug)
       {
                    print_buffer(msg, bytesRead);
                    cout << "sent on file descriptor " << iMap->first << '\n';
       }
   }
  }
 }
 if(connections.find(sender) != connections.end())
        connections[sender]->reset_batch();
    }

And to clarify connections is an instance of boost's unordered_map. The data that the program chokes on is not unique in any way either. It has been broadcast successfully to other file descriptors, but chokes on a, at least seemingly, random one.

回答1:

The kernel keeps a finite buffer for sending data. If the receiver isn't receiving, that buffer will fill up and the sender will block. Could that be the problem?



回答2:

TCP congestion control, i.e. Nagle's algorithm, along side a full buffer (SO_SNDBUF socket option) will cause the send() and similar operations to block.

The lazy way around this is to implement separate threads for each socket but this does not scale too far. On Linux you should use non-blocking sockets with poll() or similar, with Windows you would investigate IO completion ports. Look at middleware libraries to simplify this, libevent is a popular cross platform example with recent inclusion of Windows IOCP support, alternatively Boost:ASIO for C++.

A useful article to read on IO scalability would be The C10K problem.

Note you really do not want to disable Nagle's on Internet traffic, even on a LAN you might see major problems without some form of congestion feedback.