Summary:
I periodically get a .NET Fatal Execution Engine Error on an application which I cannot seem to debug. The dialog that comes up only offers to close the program or send information about the error to Microsoft. I've tried looking at the more detailed information but I don't know how to make use of it.
Error:
The error is visible in Event Viewer under Applications and is as follows:
.NET Runtime version 2.0.50727.3607 -
Fatal Execution Engine Error
(7A09795E) (80131506)
The computer running it is Windows XP Professional SP 3. (Intel Core2Quad Q6600 2.4GHz w/ 2.0 GB of RAM) Other .NET-based projects that lack multi-threaded downloading (see below) seem to run just fine.
Application:
The application is written in C#/.NET 3.5 using VS2008, and installed via a setup project.
The app is multi-threaded and downloads data from multiple web servers using System.Net.HttpWebRequest
and its methods. I've determined that the .NET error has something to do with either threading or HttpWebRequest but I haven't been able to get any closer as this particular error seems impossible to debug.
I've tried handling errors on many levels, including the following in Program.cs:
// handle UI thread exceptions
Application.ThreadException += Application_ThreadException;
// handle non-UI thread exceptions
AppDomain.CurrentDomain.UnhandledException += CurrentDomain_UnhandledException;
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
// force all windows forms errors to go through our handler
Application.SetUnhandledExceptionMode(UnhandledExceptionMode.CatchException);
More Notes and What I've Tried...
- Installed Visual Studio 2008 on the target machine and tried running in debug mode, but the error still occurs, with no hint as to where in source code it occurred.
- When running the program from its installed version (Release) the error occurs more frequently, usually within minutes of launching the application. When running the program in debug mode inside of VS2008, it can run for hours or days before generating the error.
- Reinstalled .NET 3.5 and made sure all updates are applied.
- Broke random cubicle objects in frustration.
- Rewritten parts of code that deal with threading and downloading in attempts to catch and log exceptions, though logging seemed to aggravate the problem (and never provided any data).
Question:
What steps can I take to troubleshoot or debug this kind of error? Memory dumps and the like seem to be the next step, but I'm not experienced at interpreting them. Perhaps there's something more I can do in the code to try and catch errors... It would be nice if the "Fatal Execution Engine Error" was more informative, but internet searches have only told me that it's a common error for a lot of .NET-related items.
Well, you've got a Big Problem. That exception is raised by the CLR when it detects that the garbage collected heap integrity is compromised. Heap corruption, the bane of any programmer that ever wrote code in an unmanaged language like C or C++.
Those languages make it very easy to corrupt the heap, all it takes is to write past the end of an array that's allocated on the heap. Or using memory after it has been released. Or having a bad value for a pointer. The kind of bugz that managed code was invented to solve.
But you are using managed code, judging from your question. Well, mostly, your code is managed. But you are executing lots of unmanaged code. All the low-level code that actually makes a HttpWebRequest work is unmanaged. And so is the CLR, it was written in C++ so is technically just as likely to corrupt the heap. But after over four thousand revisions of it, and millions of programs using it, the odds that it still suffers from heap cooties are very small.
The same isn't true for all the other unmanaged code that wants a piece of HttpWebRequest. The code you don't know about because you didn't write it and isn't documented by Microsoft. Your firewall. Your virus scanner. Your company's Internet usage monitor. Lord knows whose "download accelerator".
Isolate the problem, assume it is neither your code nor Microsoft's code that causes the problem. Assume it is environmental first and get rid of the crapware.
For an epic environmental FEEE story, read this thread.
Since the previous suggestions are fairly generic in nature, I thought it might be of use to post my own battle against this exception with specific code examples, the background changes I implemented to cause this exception to occur, and how I solved it.
First, the short version: I was using an in house dll that was written in C++ (unmanaged). I passed in an array of a specific size from my .NET executable. The unmanaged code attempted to write to an array location that was not allocated by the managed code. BOOM.
Below is the TL;DR version:
I am using an unmanaged dll developed in-house, written in C++. My own GUI development is in C# .Net 4.0. I am calling a variety of those unmanaged methods. That dll effectively acts as my data source. An example extern definition from the dll:
[DllImport(@"C:\Program Files\MyCompany\dataSource.dll",
EntryPoint = "get_sel_list",
CallingConvention = CallingConvention.Winapi)]
private static extern int ExternGetSelectionList(
uint parameterNumber,
uint[] list,
uint[] limits,
ref int size);
I then wrap the methods in my own interface for use throughout my project:
/// <summary>
/// Get the data for a ComboBox (Drop down selection).
/// </summary>
/// <param name="parameterNumber"> The parameter number</param>
/// <param name="messageList"> Message number </param>
/// <param name="valueLimits"> The limits </param>
/// <param name="size"> The maximum size of the memory buffer to
/// allocate for the data </param>
/// <returns> 0 - If successful, something else otherwise. </returns>
public int GetSelectionList(uint parameterNumber,
ref uint[] messageList,
ref uint[] valueLimits,
int size)
{
int returnValue = -1;
returnValue = ExternGetSelectionList(parameterNumber,
messageList,
valueLimits,
ref size);
return returnValue;
}
An example call of this method:
uint[] messageList = new uint[3];
uint[] valueLimits = new uint[3];
int dataReferenceParameter = 1;
// BUFFERSIZE = 255.
MainNavigationWindow.MainNavigationProperty.DataSourceWrapper.GetSelectionList(
dataReferenceParameter,
ref messageList,
ref valueLimits,
BUFFERSIZE);
In the GUI, one navigates through different pages containing a variety of graphics and user inputs. The previous method allowed me to get the data to populate ComboBoxes
. An example of my navigation setup and call at the time before this exception:
In my host window, I set up a property:
/// <summary>
/// Gets or sets the User interface page
/// </summary>
internal UserInterfacePage UserInterfacePageProperty
{
get
{
if (this.userInterfacePage == null)
{
this.userInterfacePage = new UserInterfacePage();
}
return this.userInterfacePage;
}
set { this.userInterfacePage = value; }
}
Then, when needed, I navigate to the page:
MainNavigationWindow.MainNavigationProperty.Navigate(
MainNavigation.MainNavigationProperty.UserInterfacePageProperty);
Everything worked well enough, though I did have some serious creeping issues. When navigating using the object (NavigationService.Navigate Method (Object)), the default setting for the IsKeepAlive
property is true
. But the issue is more nefarious than that. Even if you set the IsKeepAlive
value in the constructor of that page specifically to false
, it is still left alone by the garbage collector as if it was true
. Now for many of my pages, this was no big deal. They had small memory footprints with not all that much going on. But many other of these pages had some large highly detailed graphics on them for illustration purposes. It wasn't too long before normal usage of this interface by operators of our equipment caused huge allocations of memory that never cleared and eventually clogged up all the processes on the machine. After the rush of initial development subsided from a tsunami to more of a tidal bore, I finally decided to tackle the memory leaks once and for all. I won't go into the details of all the tricks I implemented to clean up the memory (WeakReferences to images, unhooking event handlers on Unload(), using a custom timer implementing the IWeakEventListener interface, etc...). The key change I made was to navigate to the pages using the Uri instead of the object (NavigationService.Navigate Method (Uri)). There are two important differences when using this type of navigation:
IsKeepAlive
is set to false
by default.
- The garbage collector now will try to clean up the navigation object as if
IsKeepAlive
was set to false
.
So now my navigation looks like:
MainNavigation.MainNavigationProperty.Navigate(
new Uri("/Pages/UserInterfacePage.xaml", UriKind.Relative));
Something else to note here: This not only affects how the objects are cleaned up by the garbage collector, this affects how they are initially allocated in memory, as I would soon find out.
Everything seemed to worked great. My memory would quickly get cleaned up to near my initial state as I navigated through the graphics intensive pages, until I hit this particular page with that particular call to the dataSource dll to fill in some comboBoxes. Then I got this nasty FatalEngineExecutionError
. After days of research and finding vague suggestions, or highly specific solutions that didn't apply to me, as well as unleashing just about every debugging weapon in my personal programming arsenal, I finally decided that the only way I was really going to nail this down was the extreme measure of rebuilding an exact copy of this particular page, element by element, method by method, line by line, until I finally came across the code that threw this exception. It was as tedious and painful as I'm implying, but I finally tracked it down.
It turned out to be in the way the unmanaged dll was allocating memory to write data into the arrays I was sending in for populating. That particular method would actually look at the parameter number and, from that information, allocate an array of a particular size based on the amount of data it expected to write into the array I sent in. The code that crashed:
uint[] messageList = new uint[2];
uint[] valueLimits = new uint[2];
int dataReferenceParameter = 1;
// BUFFERSIZE = 255.
MainNavigationWindow.MainNavigationProperty.DataSourceWrapper.GetSelectionList(
dataReferenceParameter,
ref messageList,
ref valueLimits,
BUFFERSIZE);
This code might seem identical to the sample above, but it has one tiny difference. The array size I allocate is 2 not 3. I did this because I knew that this particular ComboBox would only have two selection items as opposed to the other ComboBoxes on the page which all had three selection items. However the unmanaged code didn't see things the way I saw it. It got the array I handed in, and tried to write a size[ 3 ] array into my size[ 2 ] allocation, and that was it. * bang! * * crash! * I changed the allocation size to 3, and the error went away.
Now this particular code had already been running without this error for atleast a year. But the simple act of navigating to this page via a Uri
as opposed to an Object
caused the crash to appear. This implies that the initial object must be allocated differently because of the navigation method I used. Since with my old navigation method, the memory was just piled into place and left to do with as I saw fit for eternity, it didn't seem to matter if it was a bit corrupted in one or two small locations. Once the garbage collector had to actually do something with that memory (such as clean it up), it detected the memory corruption and threw the exception. Ironically, my major memory leak was covering up a fatal memory error!
Obviously we are going to review this interface to avoid such simple assumptions causing such crashes in the future. Hope this helps guide some others to find out what's going on in their own code.
A presentation that might be a nice tutorial on where to start with this kind of issue is this: Hardcore production debugging in .NET by Ingo Rammer.
I do a bit a of C++/CLI coding, and heap corruption doesn't usually result in this error; usually heap corruption either causes a data corruption and a subsequent normal exception or a memory protection error - which probably doesn't mean anything.
In addition to trying .net 4.0 (which loads unmanaged code differently) you should compare x86 and x64 editions of the CLR - if possible - the x64 version has a larger address space and thus completely different malloc (+fragmentation) behavior and so you just might get lucky and have a different (more debuggable) error there (if it occurs at all).
Also, have you turned on unmanaged code debugging in the debugger (a project option), when you run with visual studio on? And do you have Managed Debug Assistants on?
In my case I had installed an exception handler with AppDomain.CurrentDomain.FirstChanceException
. This handler was logging some exceptions, and all was fine for a few years (actually this debugging code should not have stayed in production).
But following a configuration error, the logger started to fail, and the handler itself was throwing, which apparently resulted in a FatalExecutionEngineError
seemingly coming from nowhere.
So anyone encountering this error could spend a few seconds searching for occurrences of FirstChanceException
anywhere in the code and maybe save a few hours of head scratching :)