Heisenbug: WinApi program crashes on some computer

2019-01-25 12:20发布

Please help! I'm really at my wits' end. My program is a little personal notes manager (google for "cintanotes"). On some computers (and of course I own none of them) it crashes with an unhandled exception just after start. Nothing special about these computers could be said, except that they tend to have AMD CPUs.

Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.

Here is what is certain about this "Heisenbug":

1) The crash happens only in the Release version.

2) The crash goes away as soon as I remove all GDI-related stuff.

3) BoundChecker has no complains.

4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Any ideas would be greatly appreciated!

UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:

"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."

and code breaks on

0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]

So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!

Thank you all very much for help!!

11条回答
狗以群分
2楼-- · 2019-01-25 12:29

1) The crash happens only in the Release version.

That's usually a sign that you're relying on some behaviour that's not guaranteed, but happens to be true in the debug build. For example, if you forget to initialize your variables, or access an array out of bounds. Make sure you've turned on all the compiler checks (/RTCsuc). Also check things like relying on the order of evaluation of function parameters (which isn't guaranteed).

2) The crash goes away as soon as I remove all GDI-related stuff.

Maybe that's a hint that you're doing something wrong with the GDI related stuff? Are you using HANDLEs after they've been freed, for example?

查看更多
爷的心禁止访问
3楼-- · 2019-01-25 12:29

Download the Debugging tools for Windows package. Set the symbol paths correctly, then run your application under WinDbg. At some point, it will break with an Access Violation. Then you should run the command "!analyze -v", which is quite smart and should give you a hint on whats going wrong.

查看更多
做自己的国王
4楼-- · 2019-01-25 12:31

4) Writig a log shows that the crash happen on a declaration of a local int variable! how could that be? Memory corruption?

What is the underlying code in the executable / assembly? Declaration of int is no code at all, and as such cannot crash. Do you initialize the int somehow?

To see the code where the crash happened you should perform what is called a postmortem analysis.

Windows Error Reporting

If you want to analyse the crash, you should get a crash dump. One option for this is to register for Windows Error Reporting - requires some money (you need a digital code signing ID) and some form filling. For more visit https://winqual.microsoft.com/ .

Get the crash dump intended for WER directly from the customer

Another option is to get in touch witch some user who is experiencing the crash and get a crash dump intended for WER from him directly. The user can do this when he clicks on the Technical details before sending the crash to Microsoft - the crash dump file location can be checked there.

Your own minidump

Another option is to register your own exception handler, handle the exception and write a minidump anywhere you wish. Detailed description can be found at Code Project Post-Mortem Debugging Your Application with Minidumps and Visual Studio .NET article.

查看更多
做个烂人
5楼-- · 2019-01-25 12:31

"4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?"

This could be a sign that the hardware is in fact faulty or being pushed too hard. Find out if they've overclocked their computer.

查看更多
淡お忘
6楼-- · 2019-01-25 12:32

Try Rational (IBM) PurifyPlus. It catches a lot of errors that BoundsChecker doesn't.

查看更多
冷血范
7楼-- · 2019-01-25 12:40

Most heisenbugs / release-only bugs are due to either flow of control that depends on reads from uninitialised memory / stale pointers / past end of buffers, or race conditions, or both.

Try overriding your allocators so they zero out memory when allocating. Does the problem go away (or become more reproducible?)

Writig a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Stack overflow! ;)

查看更多
登录 后发表回答