I’m really stuck with this issue and will greatly appreciate any advice.
The problem: Some of our users complain about total system “freezing” when using our product. No matter how we tried, we couldn’t reproduce it in any of systems available for troubleshooting.
The product: Physically, it’s a 32bit/64bit DLL. The product has a self-refreshing GUI, which draws a realtime spectrogram of an audio signal
Problem details: What I managed to collect from a number of fragmentary reports makes the following picture:
When GIU is opened, sometimes immediately, sometimes after a few minutes of GIU being visible, the system completely stalls, without possibility to operate with windows, start Task Manager etc. No reactions on keyboard, no mouse cursor seen (or it’s seen but is not responsibe to mouse movements – this I do not know). The user has to hard-reset the system in order to reboot. What is important, I think, is that (in some cases) for some time the GIU is responsive and shows some adequate pictures. Then this freezing happens. One of the reports tells that once the system was frozen, the audio continued to be rendered – i.e. heard by the reporter (but the whole graphic shell of Windows was already frozen). Note: in this sort of apps it’s usually a specialized thread which is responsible for sound processing.
The freezing is more or less confirmed to happen for 2 users on Windows7 x64 using both 32 and 64 bit versions of the DLL, never heard of any other OSs mentioned with connection to this freezing (though there was 1 report without any OS specified).
That’s all that I managed to collect.
The architecture / suspicions:
I strongly suspect that it’s the GUI refreshing cycle that is a culprit.
Basically, it works like this:
- There is a timer that triggers callbacks at a frame rate of approx 25 fps.
- In this callback audio analysis is performed and GUI updated
Some details about the timer:
It’s based on this call:
CreateTimerQueueTimer(&m_timerHandle, NULL, xPlatformTimerCallbackWrapper,
this, m_firstExpInterval, m_period, WT_EXECUTEINTIMERTHREAD);
We create a timer and m_timerHandle
is called periodically.
Some details about the GUI refreshing:
It works like this:
HDC hdc = GetDC (hwnd);
// Some drawing
ReleaseDC(hwnd,hdc);
My intuition tells me that this CreateTimeQueueTimer
might be not the right decision. The reference page tells that in case of using WT_EXECUTEINTIMERTHREAD
:
The callback function is invoked by the timer thread itself. This flag should be used only for short tasks or it could affect other timer operations. The callback function is queued as an APC. It should not perform alertable wait operations.
I don’t remember why this WT_EXECUTEINTIMERTHREAD
option was chosen actually, now WT_EXECUTEDEFAULT
seems equally suitable for me.
In fact, I don’t see any major difference in using any of the options mentioned in the reference page.
Questions:
- Is anything of what was told give anyone any clue on what might be wrong?
- Have you faced similar problems, what was the reason?
Thanks for any info!
==========================================
Update: 2010-02-20
Unfortunatelly, the advise given here (which I could check so far) didn't help, namelly:
- changing to WT_EXECUTEDEFAULT in
CreateTimerQueueTimer(&m_timerHandle,NULL,xPlatformTimerCallbackWrapper,this,m_firstExpInterval,m_period, WT_EXECUTEDEFAULT);
- the reenterability guard was already there
- I havent' yet checked if updateding the GUI in WM_PAINT hander helps or not
Thanks for the hints anyway.
Now, I've been playing with this for a while, also got a real W7 intallation (I used to use the virtual one) and it seems that the problem can be narrowed down.
On my installation, using of the app really get the GUI far less responsive, although I couldn't manage to reproduce a total system freezing as someone reported.
My assumption now is this responsiveness degradation and reported total freezing have a common origin.
Then I did some primitive profiling and found that at least one of the culprits is BitBlt function that is called approx 50 times a second
BitBlt ((HDC)pContext->getSystemContext (), // hdcDest
destRect.left + pContext->offset.h,
destRect.top + pContext->offset.v,
destRect.right - destRect.left,
destRect.bottom - destRect.top,
(HDC)pSystemContext,
srcOffset.h,
srcOffset.v,
SRCCOPY);
The regions being copied are not really large (approx. 400x200 pixels). It is used for displaying the backbuffer and is executed in the timer callback.
If I comment out this BitBlt call, the problem seems to disappear (at least partly).
On the same machine running WinXP everything works just fine.
Any ideas on this?