How do I resolve process hanging on CoUnitialize()

2019-06-28 01:15发布

问题:

I have a native Visual C++ NT service. When the service is started its thread calls CoInitialize() which attaches the thread to an STA - the service thread uses MSXML through COM interfaces.

When the service receives SERVICE_CONTROL_STOP it posts a message in the message queue, then later that message is retrieved and the OnStop() handler is invoked. The handler cleans up stuff and calls CoUnitialize(). Most of the time it works allright, but once in a while the latter call hangs. I can't reproduce this behavior stably.

I googled for a while and found the following likely explanations:

  1. failing to release all COM objects owned
  2. repeatedly calling CoInitializeEx()/CoUnitialize() for attaching to MTA
  3. failing to dispatch messaged in STA threads

The first one is unlikely - the code using MSXML is well tested and analyzed and it uses smart pointers to control objects lifetime, so leaking objects is really unlikely.

The second one doesn't look like the likely reason. I attach to STA and don't call those functions repeatedly.

The third one looks more or less likely. While the thread is processing the message it doesn't run the message loop anymore - it is inside the loop already. I suppose this might be the reason.

Is the latter a likely reason for this problem? What other reasons should I consider? How do I resolve this problem easily?

回答1:

Don't do anything of consequence in the thread handling SCM messages, it's in a weird magical context - you must answer SCM's requests as fast as possible without taking any blocking action. Tell it you need additional time via STOP_PENDING, queue another thread to do the real cleanup, then immediately complete the SCM message.

As to the CoUninitialize, just attach WinDbg and dump all the threads - deadlocks are easy to diagnose (maybe not to fix!), you've got all of the parties to the crime right there in the stacks.



回答2:

After very careful analysis and using the Visual Studio debugger (thanks to user Pall Betts for pointing out that getting evidence is important) to inspect all active threads I discovered that the process hang not on calling CoUninitialize(), but instead on RpcServerUnregisterIf() function called from our program code right before CoUninitialize(). Here's a sequence diagram:

WorkerThread                            RpcThread                    OuterWorld
  |----| Post "stop service" message        |                            |
  |<---|                                    |  SomeRpcServerMethod()     |
  |      Post "process rpc request"         |<---------------------------|
  |<----------------------------------------|                          waits
  |                                         |----|Wait until
  |----| Process "stop service" message     |    |request is processed
  |<---| (call OnStop())                    |    |by the worker thread
  |                                         |    |
  |----| RpcServerUnregisterIf()            |    |
  |X<--| Wait all rpc requests complete     |X<--|
  |                                         |

An inbound RPC request comes and RPC runtime spawns a thread to service it. The request handler queues request to the worker thread and waits.

Now the moonphase happens to be just right and so RpcServerUnregisterIf() is executed in parallel with the handler in the RPC thread. RpcServerUnregisterIf() waits for all inbound RPC requests to complete and the RPC handler waits for the main thread to process the request. That's a plain old deadlock.