可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This is an extension of my previous question, Application crash with no explanation.

I have a lot of crashes that are presumably caused by heap corruption on an application server. These crashes only occur in production; they cannot be reproduced in a test environment.

I'm looking for a way to track down these crashes.

Application Verifier was suggested, and it would be fine, but it's unusable with our production server. When we try to start it in production with application verifier, it becomes so slow that it's completely unusable, even though this is a fairly powerful server (64-bit application, 16 GB memory, 8 processors). Running it without application verifier, it only uses about 1 GB of memory and no more than 10-15% of any processor's cycles.

Are there any other tools that will help find heap corruption, without adding a huge overhead?

回答1:

Use the debug version of the Microsoft runtime libraries. Turn on red-zoning and get your heap automatically checked every 128 (say) heap operations by calling _CrtSetDbgFlag() once during initialisation.

_CRTDBG_DELAY_FREE_MEM_DF can be quite useful for finding memory-used-after-free bugs, but your heap size grows monitonically while using it.

回答2:

Would there be any benefit in running it virtualized and taking scheduled snapshots, so that you hopefully can get a snapshot just a little before it actually crashes? Then take the pre-crash snapshot and start it in a lab environment. If you can get it to crash again there, restart the snapshot and start inspecting your server process.

回答3:

Mudflap with GCC. It does code instrumentation for production code.
You have to compile your soft with -fmudflap. It will check any wrong pointer access (heap/stack/static). It is designed to work for production code with a little slowdown (between x1.5 to x5). You can also disable check at read access for speedup.