I have a native Visual C++ NT service. When the service is started its thread calls CoInitialize()
which attaches the thread to an STA - the service thread uses MSXML through COM interfaces.
When the service receives SERVICE_CONTROL_STOP
it posts a message in the message queue, then later that message is retrieved and the OnStop()
handler is invoked. The handler cleans up stuff and calls CoUnitialize()
. Most of the time it works allright, but once in a while the latter call hangs. I can't reproduce this behavior stably.
I googled for a while and found the following likely explanations:
- failing to release all COM objects owned
- repeatedly calling
CoInitializeEx()
/CoUnitialize()
for attaching to MTA - failing to dispatch messaged in STA threads
The first one is unlikely - the code using MSXML is well tested and analyzed and it uses smart pointers to control objects lifetime, so leaking objects is really unlikely.
The second one doesn't look like the likely reason. I attach to STA and don't call those functions repeatedly.
The third one looks more or less likely. While the thread is processing the message it doesn't run the message loop anymore - it is inside the loop already. I suppose this might be the reason.
Is the latter a likely reason for this problem? What other reasons should I consider? How do I resolve this problem easily?
After very careful analysis and using the Visual Studio debugger (thanks to user Pall Betts for pointing out that getting evidence is important) to inspect all active threads I discovered that the process hang not on calling
CoUninitialize()
, but instead onRpcServerUnregisterIf()
function called from our program code right beforeCoUninitialize()
. Here's a sequence diagram:An inbound RPC request comes and RPC runtime spawns a thread to service it. The request handler queues request to the worker thread and waits.
Now the moonphase happens to be just right and so
RpcServerUnregisterIf()
is executed in parallel with the handler in the RPC thread.RpcServerUnregisterIf()
waits for all inbound RPC requests to complete and the RPC handler waits for the main thread to process the request. That's a plain old deadlock.Don't do anything of consequence in the thread handling SCM messages, it's in a weird magical context - you must answer SCM's requests as fast as possible without taking any blocking action. Tell it you need additional time via STOP_PENDING, queue another thread to do the real cleanup, then immediately complete the SCM message.
As to the CoUninitialize, just attach WinDbg and dump all the threads - deadlocks are easy to diagnose (maybe not to fix!), you've got all of the parties to the crime right there in the stacks.