C#, VS 2010
I need to determine if a float value is NaN.
Testing a float for NaN using
float.IsNaN(aFloatNumber)
crashes with a stack overflow.
So does
aFloatNumber.CompareTo(float.NaN).
The following does not crash, but it's not useful as it returns NaN regardless:
aFloatNumber - float.NaN
A search for "stack overflow" returns results about this website instead of results about an actual stack overflow, so I can't find relevant answers.
Why is my application going into a stack overflow when testing for NaN?
Edit: the call stack:
Edit: it's clearly something in my code: this statement:
bool aaa = float.IsNaN(float.NaN);
- works OK in the constructor of the application, right after InitializeComponent();
- works OK in the constructor of theclass for a custom control, right after InitializeComponent();
- but crashes in an event handler inside the class for a custom control.
So, this is what I am doing:
- Abstract Custom control: public abstract partial class ConfigNumberBaseBox : TextBox
- has a Validating event handler ValidateTextBoxEntry
- ValidateTextBoxEntry is defined inside the ConfigNumberBaseBox class
- Custom control that inherits from ConfigNumberBaseBox : public partial class ConfigTemperBox : ConfigNumberBaseBox
- Run the app
- When I finish editing a ConfigTemperBox control, ValidateTextBoxEntry is called
- ValidateTextBoxEntry runs fine until it encounters float.IsNaN
- stack overflow
Edit:
Debug.WriteLine() shows that the code is executed only once: no recursion.
Edit:
This works:
float fff = 0F;
int iii = fff.CompareTo(float.PositiveInfinity);
This crashes:
float fff = 0F;
int iii = fff.CompareTo(float.NaN);
This is the only real hint towards the underlying problem. Code that runs on a thread can manipulate two stacks inside the processor. One is the normal one that everybody knows about and gave this web site its name. There is however another one, well hidden inside the FPU (Floating Point Unit). It stores intermediate operand values while making floating point calculations. It is 8 levels deep.
Any kind of mishap inside the FPU is not supposed to generate runtime exceptions. The CLR assumes that the FPU is configured with its defaults for the FPU control word, the hardware exceptions it can generate are supposed to be disabled.
That does have a knack for going wrong when your program uses code that came from the 1990s, back when enabling FPU exceptions still sounded like a good idea. Code generated by Borland tooling are notorious for doing this for example. Its C runtime module reprograms the FPU control word and unmasks the hardware exceptions. The kind of exceptions you can get for that can be very mysterious, using NaN in your code is a good way to trigger such an exception.
This should be at least partially visible with the debugger. Set a breakpoint on the "still good" code and use the Debug + Windows + Registers debugger window. Right-click it and select "Floating point". You'll see all of the registers that are involved with floating point calculations, ST0 through ST7 are the stack registers for example. The important one here is marked
CTRL
, its normal value in a .NET process is027F
. The last 6 bits in that value are the exception masking bits (0x3F), all turned on to prevent hardware exceptions.Single step through the code and the expectation is that you see the CTRL value change. As soon as it does then you'll have found the evil code. If you enable unmanaged debugging then you should also see the load notification in the Output window and see it appear in the Debug + Windows + Module window.
Undoing the damage that the DLL did is fairly awkward. You'd have to pinvoke _control87() in msvcrt.dll for example to restore the CTRL word. Or a simple trick that you can use, you can intentionally throw an exception. The exception handling logic inside the CLR resets the FPU control word. So with some luck, this kind of code is going to solve your problem:
You may have to move it, next best guess is the Load event. The debugger should tell you where.
I just wrote an example to reproduce the error: 1. Create a native C/C++ DLL which exports this function:
2. Create a C# console program, which call the function SetfloatingControlWord, after that, do some floating operation such as NaN compare, then it leads to stack overflow.
I encountered the same problem years ago, also, I noticed that after an .NET exception throws, everything works fine, it took me a while to figure out why and trace the code which changed the FPU.
As the doc of function _controlfp_s says: By default, the run-time libraries mask all floating-point exceptions. The common language runtime (CLR) only supports the default floating-point precision, so CLR doesn't handle these kind exceptions.
As MSDN says:By default, the system has all FP exceptions turned off. Therefore, computations result in NAN or INFINITY, rather than an exception.
After NaN was introduced in IEEE 754 1985, it suppose that application software no longer need to handle the floating point exceptions.
The solution:
First of all, thank you to @Matt for pointing me in the right direction, and @Hans Passant for providing the workaround.
The application talks to a CAN-USB adapter from Chinese manufacturer QM_CAN.
The problem is in their driver.
The DLL statements and Driver import:
The call to the offending code, including Hans' workaround:
The reason that the application crashed when a reference was made to float.NaN in the event handler and not in the constructor was a simple matter of timing: the constructor is called before InitCanUsbDLL(), but the event handler was called long after InitCanUsbDLL() corrupted the FPU registers.