Buffer overrun during Garbage Collection: psychic

2019-04-11 02:34发布

问题:

Currently testing a C# (.Net 4.5) WPF application built on top of a C++ library (managed, I believe, I didn't write it). For various (practical) reasons, it's running on a server (with VS2012 installed, yes, yuck).

The program hooks up to a camera (via the library) and displays the image frames that it receives.

What's weird is that I'm getting buffer overruns (buffer overflows I could understand). And during Garbage Collection!

A buffer overrun has occurred in App.exe which has corrupted the program's internal state.

Various other potentially useful tidbits of information:

  • Upping the 'throughput' makes it happen sooner (seconds instead of minutes)
  • Running in VS (debug or release) stops it happening at all (or at least delays it longer than I'm prepared to wait)
  • There's no unsafe in my C#, and the only 'esoteric' thing I'm doing is converting a bitmap (from the library) into a BitmapSource(like this).
  • The libraries are compiled for x86, the exe too.

Call stack, same every time:

vcr110_clr0400.dll!__crt_debugger_hook ()   Unknown
clr.dll!___raise_securityfailure () Unknown
clr.dll!___report_gsfailure ()  Unknown
clr.dll!CrawlFrame::SetCurGSCookie(unsigned long *) Unknown
clr.dll!StackFrameIterator::Init(class Thread *,class Frame *,struct _REGDISPLAY *,unsigned int)    Unknown
clr.dll!Thread::StackWalkFramesEx(struct _REGDISPLAY *,enum StackWalkAction (*)(class CrawlFrame *,void *),void *,unsigned int,class Frame *)   Unknown
clr.dll!Thread::StackWalkFrames(enum StackWalkAction (*)(class CrawlFrame *,void *),void *,unsigned int,class Frame *)  Unknown
clr.dll!CNameSpace::GcScanRoots(void (*)(class Object * *,struct ScanContext *,unsigned long),int,int,struct ScanContext *,class GCHeap *)  Unknown
clr.dll!WKS::gc_heap::mark_phase(int,int)   Unknown
clr.dll!WKS::gc_heap::gc1(void) Unknown
clr.dll!WKS::gc_heap::garbage_collect(int)  Unknown
clr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int,enum WKS::gc_reason) Unknown
clr.dll!WKS::GCHeap::GarbageCollectTry(int,int,int) Unknown
clr.dll!WKS::GCHeap::GarbageCollect(int,int,int)    Unknown
clr.dll!GCInterface::Collect(int,int)   Unknown
mscorlib.ni.dll!6dcd33e5()  Unknown
[Frames below may be incorrect and/or missing, no symbols loaded for mscorlib.ni.dll]   
mscorlib.ni.dll!6dcd33e5()  Unknown
064afa73()  Unknown
clr.dll!MethodTable::FastBox(void * *)  Unknown
clr.dll!MethodTable::CallFinalizer(class Object *)  Unknown
clr.dll!SVR::CallFinalizer(class Object *)  Unknown
clr.dll!SVR::CallFinalizer(class Object *)  Unknown
clr.dll!SVR::CallFinalizer(class Object *)  Unknown
clr.dll!WKS::GCHeap::FinalizerThreadWorker(void *)  Unknown
clr.dll!Thread::DoExtraWorkForFinalizer(void)   Unknown
clr.dll!Thread::DoExtraWorkForFinalizer(void)   Unknown
clr.dll!Thread::DoExtraWorkForFinalizer(void)   Unknown
clr.dll!WKS::GCHeap::FinalizerThreadStart(void *)   Unknown
clr.dll!Thread::intermediateThreadProc(void *)  Unknown
kernel32.dll!@BaseThreadInitThunk@12 () Unknown
ntdll.dll!___RtlUserThreadStart@8 ()    Unknown
ntdll.dll!__RtlUserThreadStart@8 () Unknown

回答1:

Unlike the v2 CLR, the v4 CLR was built with the Microsoft secure CRT extension enabled. Which include checks that, at function exit, the "stack canary" didn't get overwritten. Enabled by the /GS compiler option.

The likely end of your program in the previous version would have been a Fatal Execution Engine Exception, triggered by the access violation that would have been raised when the function tries to return and the return address got corrupted. It now catches the problem sooner. And more reliably, that corrupted return address could by accident point to valid code. What happens next if that's the case is usually truly undiagnosable. And exploitable.

But the root cause is the same, the GC heap getting corrupted.



回答2:

Looks like a memory corruption to me; the library is likely using unsafe and/or unmanaged memory or pinned memory... or maybe it is not pinning the correct bits of memory, or unpinning them too early?

As for:

Running in VS (debug or release) stops it happening at all (or at least delays it longer than I'm prepared to wait)

This is because processes created by a debugger use a different heap (even if you are running in release mode); using this alternate heap is a known source of heisenbugs when dealing with random memory corruption (I have not found many sources on this point however; I thought it was on Raymond Chen blog somewhere but I only found this)

EDIT: reference found! From MSDN:

Processes that the debugger creates (also known as spawned processes) behave slightly differently than processes that the debugger does not create.
Instead of using the standard heap API, processes that the debugger creates use a special debug heap. You can force a spawned process to use the standard heap instead of the debug heap by using the _NO_DEBUG_HEAP environment variable or the -hd command-line option.

My best guess is then: the C++ library corrupts some memory. The GC comes, finds the heap corrupted, crash. OR: the C++ library does forget to pin the memory it is using as a buffer for images. The GC comes, move the memory. The C++ library does not know, writes to a now invalid pointer, causing corruption. The GC comes again, start to work on the now corrupted memory, crash