This has got me stumped. I was trying to optimize some tests for Noda Time, where we have some type initializer checking. I thought I'd find out whether a type has a type initializer (static constructor or static variables with initializers) before loading everything into a new AppDomain
. To my surprise, a small test of this threw NullReferenceException
- despite there being no null values in my code. It only throws the exception when compiled with no debug information.
Here's a short but complete program to demonstrate the problem:
using System;
class Test
{
static Test() {}
static void Main()
{
var cctor = typeof(Test).TypeInitializer;
Console.WriteLine("Got initializer? {0}", cctor != null);
}
}
And a transcript of compilation and output:
c:\Users\Jon\Test>csc Test.cs
Microsoft (R) Visual C# Compiler version 4.0.30319.17626
for Microsoft (R) .NET Framework 4.5
Copyright (C) Microsoft Corporation. All rights reserved.
c:\Users\Jon\Test>test
Unhandled Exception: System.NullReferenceException: Object reference not set to
an instance of an object.
at System.RuntimeType.GetConstructorImpl(BindingFlags bindingAttr, Binder bin
der, CallingConventions callConvention, Type[] types, ParameterModifier[] modifi
ers)
at Test.Main()
c:\Users\Jon\Test>csc /debug+ Test.cs
Microsoft (R) Visual C# Compiler version 4.0.30319.17626
for Microsoft (R) .NET Framework 4.5
Copyright (C) Microsoft Corporation. All rights reserved.
c:\Users\Jon\Test>test
Got initializer? True
Now you'll notice I'm using .NET 4.5 (the release candidate) - which may be relevant here. It's somewhat tricky for me to test it with the various other original frameworks (in particular "vanilla" .NET 4) but if anyone else has easy access to machines with other frameworks, I'd be interested in the results.
Other details:
- I'm on an x64 machine, but this problem occurs with both x86 and x64 assemblies
- It's the "debug-ness" of the calling code which makes a difference - even though in the test case above it's testing it on its own assembly, when I tried this against Noda Time I didn't have to recompile
NodaTime.dll
to see the differences - justTest.cs
which referred to it. - Running the "broken" assembly on Mono 2.10.8 doesn't throw
Any ideas? Framework bug?
EDIT: Curiouser and curiouser. If you take out the Console.WriteLine
call:
using System;
class Test
{
static Test() {}
static void Main()
{
var cctor = typeof(Test).TypeInitializer;
}
}
It now only fails when compiled with csc /o- /debug-
. If you turn on optimizations, (/o+
) it works. But if you include the Console.WriteLine
call as per the original, both versions will fail.
with
csc test.cs
:Trying to load from
[rsi+8]
when@rsi
is NULL. Lets inspect the function:@rsi
is loaded in the beginning from[rsp+20h]
so it must be passed by caller. Lets look at the caller:(My disassemble shows
System.Console.get_In
because I added aConsole.GetLine()
in test.cs to have an opportunity to break in debugger. I validated it doesn’t change the behavior).We're in this call:
000007fe8d45010c 41ff5228 call qword ptr [r10+28h]
(our AV frame ret address is the instruction right after thiscall
).Lets compare this with what happens when we compile
csc /debug test.cs
. We can set up abp 000007fee5735360
, luckily the module loads at the same address. On the instruction that loads@rsi
:Note that
@rsi
is 00000000002debd8. Stepping through the function shows that this the address that will be dereferenced later at the place when the bad exe bombs (ie.@rsi
does not change). The stack is very interesting because it shows an extra frame:The call is the same
call qword ptr [r10+28h]
that we've seen before, so in the bad case this function was probably inlined in theMain()
, so the fact that there is an extra frame is a red herring. If we look at the preparation of thiscall qword ptr [r10+28h]
we notice this instruction:mov qword ptr [rsp+20h],rcx
. This is what loads the address which gets eventually dereferenced as@rsi
. In the good case, this is how@rcx
is loaded:In the bad case it looks very different:
This is very different. Unlike the good case that calls CORINFO_HELP_GETSHARED_GCSTATIC_BASE and reads what ends up as the critical pointer that causes the AV from some member at offset
1F0
in a return structure, the optimized code loads it from a static address. And of course 12721220h contains NULL:Unfortunately is too late for me to dig deeper right now, the dissasembly of
CORINFO_HELP_GETSHARED_GCSTATIC_BASE
is far from trivial. I'm posting this in hope someone more knowledgeable in CLR internals can make sense (as you can see, I really considered the issue just from the native instructions POV and completely ignored IL).As I believe I've found some new interesting findings about the problem, I decided to add them as an answer, acknowledging at the same time that they are not addressing the "why it happens" in the original question. Maybe someone who knows more about the internal workings of the involved types might post an edifying answer based also on the observations I'm posting.
I've also managed to reproduce the issue on my machine and I've tracked a connection with the System.Runtime.InteropServices._Type Interface, which is implemented by the
System.Type
class.Initially, I've found at least 3 workaround approaches for fixing the problem:
Simply by casting the
Type
to_Type
inside theMain
method:Or making sure that approach 1 was used previously inside the method:
Or by adding a static field to the
Test
class and initializing it (with casting it to_Type
):Later on, I discovered that if we don't want to involve the
System.Runtime.InteropServices._Type
interface in the workarounds, the problem doesn't occur either by:Adding a static field to the
Test
class and initializing it (without casting it to_Type
):Or by initializing the
cctor
variable itself as a static field of the class:I'm looking forward to your feedback.