This is an extension of my previous question, Application crash with no explanation.
I have a lot of crashes that are presumably caused by heap corruption on an application server. These crashes only occur in production; they cannot be reproduced in a test environment.
I'm looking for a way to track down these crashes.
Application Verifier was suggested, and it would be fine, but it's unusable with our production server. When we try to start it in production with application verifier, it becomes so slow that it's completely unusable, even though this is a fairly powerful server (64-bit application, 16 GB memory, 8 processors). Running it without application verifier, it only uses about 1 GB of memory and no more than 10-15% of any processor's cycles.
Are there any other tools that will help find heap corruption, without adding a huge overhead?
Use the debug version of the Microsoft runtime libraries. Turn on red-zoning and get your heap automatically checked every 128 (say) heap operations by calling _CrtSetDbgFlag()
once during initialisation.
_CRTDBG_DELAY_FREE_MEM_DF
can be quite useful for finding memory-used-after-free bugs, but your heap size grows monitonically while using it.
Would there be any benefit in running it virtualized and taking scheduled snapshots, so that you hopefully can get a snapshot just a little before it actually crashes? Then take the pre-crash snapshot and start it in a lab environment. If you can get it to crash again there, restart the snapshot and start inspecting your server process.
Mudflap with GCC. It does code instrumentation for production code.
You have to compile your soft with -fmudflap
. It will check any wrong pointer access (heap/stack/static). It is designed to work for production code with a little slowdown (between x1.5 to x5). You can also disable check at read access for speedup.