可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I work on IOCP Server (Overlapped I/O , 4 threads, CreateIoCompletionPort, GetQueuedCompletionStatus, WSASend etc). I also created an auto-reset event and put the handle in the OVERLAPPED structure for each asynchronous I/O operations.

And the question is: How to send buffer too all connected sockets properly? Each socket is stored in linked list of context info structures.

I'm not sure is that approach below is ok?

...
DWORD WINAPI WorkerThread() { // 1 of 4 workthread
...
GetQueuedCompletionStatus(...);
...
PPER_SOCKET_CONTEXT  pTmp1, pTmp2;
pTmp1 = g_pCtxtList; // start of linked list with sockets
EnterCriticalSection(&g_CriticalSection);
while( pTmp1 ) 
{  
pTmp2 = pTmp1->pCtxtBack;
         WaitForSingleObject(pTmp1->pIOContext->Overlapped.hEvent,infinite);                
          // if there is any pending wsasend on this socket,wait to completed, so
          // we can post another wsasend using the same overlapped structure
          // and buffer
           WSASend(pTmp1->Socket,...,&(pTmp1->pIOContext->Overlapped), NULL);
       pTmp1 = pTmp2;
 }
LeaveCriticalSection(&g_CriticalSection);
...
}

And what happens if another thread also tries to do the same work at same time?
Is that good idea to use GQCS and waits function in all threads?
Any clue about wsasends to all clients in multithreaded iocp server will be appreciated.
thx

回答1:

As Martin says, this will perform terribly and likely kill the performance of anything that uses the list of sockets that you lock for the entire duration of your send to all connections. You don't say if this is UDP or TCP but if it's TCP be aware that you are now handing control of your server's performance over to the clients as TCP flow control on a slow client connection may cause the write completion to be delayed (see here)- and I assume you're using the write completion to trigger the event?

I assume that your actual requirement is that you want to avoid copying the data on the server and allocating multiple buffers, one for each connection either due to memory constraints or because you've profiled the memory copy and found that it's expensive.

The way I deal with this is to have a single reference counted buffer and a 'buffer handle' which is just a slightly extended overlapped structure which references your single data buffer and provides the WSABUF that you need. You can then issue a 'fire and forget' write to each connection using a unique 'buffer handle' all of which refer to the single underlying buffer. Once all writes complete the ref count on the buffer is reduced to zero and it cleans up - and as Martin says, that clean up is best achieved by putting the buffer into a pool for later reuse.

Note: I'm not sure that I actually understand what you are trying to do (so I orginally deleted my answer), let me know if I'm not following and I'll adjust...

回答2:

Not sure I understand some of that. IOCP typically does not use hEvent field in the OVL struct. I/O completion is signaled by queueing a completion message to the 'completion port', (ie. a queue). You seem to be using the hEvent field for some 'unusual' extra signaling to manage a single send data buffer and OVL block.

Obviously, I don't have the whole story from your post, but it looks to me that you are making heavy work for yourself on the tx side and serialising the sends will strangle performance:)

Do you HAVE to use the same OVL/buffer object for succcessive sends? What I usually do is use a different OVL/buffer for each send and just queue it up immediately. The kernel will send the buffers in sequence and return a completion message for each one. There is no problem with multiple IOCP tx requests on a socket - that's what the OVL block is for - to link them together inside the kernel stack.

There is an issue with having multiple IOCP receive requests for a socket outstanding - it can happen that two pool threads get completion packets for the same socket at the same time and so possibly resulting in out-of-order processing. Fixing that issue 'properly' requires something like an incrementing sequence-number in each rx buffer/OVL object issued and a critical-section and buffer-list in each socket object to 'save up' out-of-order buffers until all the earlier ones have been processed. I have a suspicion that many IOCP servers just dodge this issue by only having one rx IOCP request in at a time, (probably at the expense of performance).

Getting through a lot of buffers in this way could be somewhat taxing if they are being continually constructed and destroyed, so I don't normally bother and just create a few thousand of them at startup and push them, (OK, pointers to them), onto a producer-consumer 'pool queue', popping them off when a tx or rx is required and pushing them back on again. In the case of tx, this would happen when a send completion message is picked up by one of the IOCP pool threads. In the case of rx, it would happen when a pool thread, (or some other thread that has had the object queued to it by a pool thread), has processed it and no longer needs it.

Ahh.. you want to send exactly the same content to the list of sockets - like a chat server type thingy.

OK. So how about one buffer and multiple OVL blocks? I have not tried it, but don't see why it would not work. In the single buffer object, keep an atomic reference count of how many overlapped send requests you have sent out in your 'send to all clients' loop. When you get the buffers back in the completion packets, decrement the refCount towards zero and delete/repool the buffer when you get down to 0.

I think that should work, (?).