Heisenbug: WinApi program crashes on some computer

2019-01-25 12:20发布

Please help! I'm really at my wits' end. My program is a little personal notes manager (google for "cintanotes"). On some computers (and of course I own none of them) it crashes with an unhandled exception just after start. Nothing special about these computers could be said, except that they tend to have AMD CPUs.

Environment: Windows XP, Visual C++ 2005/2008, raw WinApi.

Here is what is certain about this "Heisenbug":

1) The crash happens only in the Release version.

2) The crash goes away as soon as I remove all GDI-related stuff.

3) BoundChecker has no complains.

4) Writing a log shows that the crash happens on a declaration of a local int variable! How could that be? Memory corruption?

Any ideas would be greatly appreciated!

UPDATE: I've managed to get the app debugged on a "faulty" PC. The results:

"Unhandled exception at 0x0044a26a in CintaNotes.exe: 0xC000001D: Illegal Instruction."

and code breaks on

0044A26A cvtsi2sd xmm1,dword ptr [esp+14h]

So it seems that the problem was in the "Code Generation/Enable Enhanced Instruction Set" compiler option. It was set to "/arch:SSE2" and was crashing on the machines that didn't support SSE2. I've set this option to "Not Set" and the bug is gone. Phew!

Thank you all very much for help!!

11条回答
We Are One
2楼-- · 2019-01-25 12:43

4) Writig a log shows that the crash happen on a declaration of a local int variable!how could that be? Memory corruption

I've found the cause to numerous "strange crashes" to be dereferencing of a broken this inside a member function of said object.

查看更多
Luminary・发光体
3楼-- · 2019-01-25 12:43

Sounds like stack corruption to me. My favorite tool to track those down is IDA Pro. Of course you don't have that access to the user's machine.

Some memory checkers have a hard time catching stack corruption ( if it indeed that ). The surest way to get those I think is runtime analysis.

This can also be due to corruption in an exception path, even if the exception was handled. Do you debug with 'catch first-chance exceptions' turned on? You should as long as you can. It does get annoying after a while in many cases.

Can you send those users a checked version of your application? Check out Minidump Handle that exception and write out a dump. Then use WinDbg to debug on your end.

Another method is writing very detailed logs. Create a "Log every single action" option, and ask the user to turn that on and send it too you. Dump out memory to the logs. Check out '_CrtDbgReport()' on MSDN.

Good Luck!

EDIT:

Responding to your comment: An error on a local variable declaration is not surprising to me. I've seen this a lot. It's usually due to a corrupted stack.

Some variable on the stack may be running over it's boundaries for example. All hell breaks loose after that. Then stack variable declarations throw random memory errors, virtual tables get corrupted, etc.

Anytime I've seen those for a prolong period of time, I've had to go to IDA Pro. Detailed runtime disassembly debugging is the only thing I know that really gets those reliably.

Many developers use WinDbg for this kind of analysis. That's why I also suggested Minidump.

查看更多
Explosion°爆炸
4楼-- · 2019-01-25 12:46

When I get this type of thing, i try running the code through gimpels PC-Lint (static code analysis) as it checks different classes of errors to BoundsChecker. If you are using Boundschecker, turn on the memory poisoning options.

You mention AMD CPUs. Have you investigated whether there is a similar graphics card / driver version and / or configuration in place on the machines that crash? Does it always crash on these machines or just occasionally? Maybe run the System Information tool on these machines and see what they have in common,

查看更多
一纸荒年 Trace。
5楼-- · 2019-01-25 12:51

So it doesnnt crash when configuration is DEBUG Configuration? There are many things different than a RELEASE configruation: 1.) Initialization of globals 2.) Actual machine Code generated etc..

So first step is find out what are exact settings for each parameter in the RELEASE mode as compared to the DEBUG mode.

-AD

查看更多
ら.Afraid
6楼-- · 2019-01-25 12:53

What does the crash say ? Access violation ? Exception ? That would be the further clue to solve this with

Ensure you have no preceeding memory corruptions using PageHeap.exe

Ensure you have no stack overflow (CBig array[1000000])

Ensure that you have no un-initialized memory.

Further you can run the release version also inside the debugger, once you generate debug symbols (not the same as creating debug version) for the process. Step through and see if you are getting any warnings in the debugger trace window.

查看更多
登录 后发表回答