I guess this argument is important and deserve some space here.
Let's consider the most common design of I/O Completion Ports in C/C++, having a structure (or a class) which abstract the HANDLE, and some of its properties, like this:
class Stream
{
enum
{
Open = 1,
Closed = 0
};
// Dtor
virtual ~Stream() { if (m_read_packet != 0) delete_packet(m_read_packet); // the same for write packet }
// Functions:
bool read(...) { if (m_read_packet != 0) m_read_packet = allocate_packet(); ReadFile(m_handle ...); }
bool write(...);
bool close() { m_state = Closed; if (!CloseHandle(m_handle)) return false; else return true; }
// Callbacks:
virtual void onRead(...);
virtual void onWrite(...);
virtual void onEof();
virtual void onClose();
// Other stuff ....
// Properties:
HANDLE m_handle;
int m_state;
int m_pending_reads;
int m_pending_writes;
IoPacket * m_read_packet;
IoPacket * m_write_packet;
};
This is pretty straightforward, you have this class which abstract an HANDLE, and the IO Completion Port threads call its callbacks when i/o is completed. Plus, you can cache and reuse the OVERLAPPED structures (derived by IoPacket) stored in those pointers.
You use this class by allocating a Stream object yourself, or just by let other classes to allocate it, after a connection arrives (e.g. AcceptEx(), or a named pipe server). The Stream object will automatically start to do i/o operation based on its callbacks implementation.
Now, in both cases you choose [1) allocate Stream yourself (client) or 2) from other objects (server)] there will be a common problem: you don't know exactly when you can safely deallocate from memory the Stream object. And this is true even if you use atomic ref counters for IoPacket objects.
Let's see why: while doing i/o, the Stream object pointer would be contained in the IoPacket objects, which in turn are processed and used by other threads in the threadpool, which in turn uses that pointer to call the callbacks [onRead(), onWrite() and so on...]. You have also `m_pending_reads/writes' variables to see how many pending reads/writes there are. So, the Stream object pointer will be shared accross threads.
At this point, let's consider you don't need that connection anymore and you want to close and deallocate the related Stream object.
how to do this safely?
I mean, if you simply do:
stream->close();
delete stream;
you'll end up in some crashes because another thread may still be using the 'stream' pointer. The safe way would be this:
stream->close();
while ((stream->m_pending_reads != 0) && (stream->m_pending_writes != 0));
delete stream;
But this is utterly inefficent since this blocks the path of executiong, block-waiting that other threads finish their work with the 'stream' object. (And in addition I think I'll have add some volatile or barrier mechanism?) The best way would be do that asynchronously, with another thread completing the LAST IoPacket for that Stream object, and call onClose() which in turn will be:
void onClose() { delete this; }
So my question is: how to recognize that I'm processing (in any thread) the LAST IoPacket (or OVERLAPPED) for the given handle, so after I could call safely the onClose() routine which deletes the Stream objects, which dtor in turn deletes the cached IoPackets, used for doing i/o stuff.
For now I use this method: if (for reading completions) `GetQueuedCompletionStatus()' returns ERROR_OPERATION_ABORTED [so it means CloseHandle(m_handle); has been called] OR bytes_read==0 [it means EOF] OR if (state!=Open) THEN it means that it was the last reading packet pending, then calling onClose(); after processing that. The write packets will ignore onClose().
I'm not sure that this is the correct method, because even if the reading packets are those which will be always pending until an i/o event happens, it may happen that onClose() will be called by some thread scheduled before another thread which is willing to process a last write packet.
And what about those Stream objects derivatives (e.g. TcpClient) which also have other IoPackets? e.g. one for ConnectEx().
A possible solution would be to have AtomicInt m_total_pending_requests; <- and when this atomic integer reaches 0, the completion thread calls onClose();
But, is this last solution inefficent? Why do I have use atomic integers to do such a basic thing like closing a connection and deallocating its structures?
Maybe I'm totally in the wrong path with design for IOCPs with those Stream objects, but i guess this is a pretty common used design for abstracting HANDLEs, so I wanted to hear some opinion from IOCPs gurus. Do I have to change everything, or this design of Stream objects could work pretty solid, safe, and most important, fast? I ask this because of the complexity came up for doing such a basic thing like close+free memory.
Reference counting solves the problem.
You simply increment a reference count when you do anything which will generate an IOCP completion and decrement it once you have finished processing the completion. Whilst there are operations outstanding your reference is > 1 and the handle object is valid. Once all operations complete your reference drops to 0 and the handle can delete itself.
I do the same for data buffers as my code allows for multiple concurrent send/recv operations to be in progress per connection and so storing overlapped structures in the handle itself isn't an option.
I have some code which demonstrates the above, you can download it from here.