What is the CLR implementation behind raising/gene

2019-02-22 13:09发布

问题:

We do come across this particular and one of the most common exception in our coding/development life day or another day. My Question is NOT about WHY (I am aware it raises when we try to access properties of a reference variable which actually points to null) but its is about HOW the NULL REFERENCE EXCEPTION is generated by CLR.

Sometimes I am forced to think the mechanism for identifying a reference to a null (Perhaps null is a reserved space in memory) and then raising an Exception by CLR. How CLR identify and raises this particular Exception. Does OS play any role in it?

I would like to share one of the most interesting claims about it:

null is actually an all time reserved memory space known to CLR, and all kind of access are prohibited. Thus , when reference for that space is found, it by default generates access denied kind of exception via OS which is interpreted as a NULL Reference Exception by CLR.

I didn't found any articles or posts supporting the above statement, thus hard to believe it. Might by I am missing to dig in details or other reasons, I expect Stackoverflow is one of the most appropriate platform where I will get the best response.

回答1:

It doesn't have to be (there could be explicit checks), but it works from trapping access violation exceptions.

A .NET object will be turned into a native object: Its fields become a block of memory laid out in a particular manner, its methods are jitted into native machine code methods, and a v-table or other virtual method overload mechanism is created.

  1. Accessing a field then, means finding the address of the object, adding on the offset of the member, and reading or writing the piece of memory referred to.

  2. Calling a virtual method, means finding the address of the object, finding its method table (set offset within object), finding the method's address (set offset within the table) and calling the method at that address with the address of the object being passed (the this pointer).

  3. Calling a non-virtual method, means calling the method with the address of the object passed (the this pointer).

Clearly if there is not an actual object at the address in question cases 1 and 2 will go wrong in some way, while case 3 will work (but could in turn lead to case 1 or 2). There's two main ways this can go wrong:

  1. It could access an arbitrary bit of memory that is not really an object of our type, leading to all sorts of exciting and really hard to trace bugs (.NET code generally won't result in anything that causes this scenario).

  2. It could access an arbitrary bit of memory that is protected, leading to an access violation.

You may know about the second case from C, C++ or ASM coding. If not, you'll probably still have seen a program crash and with its dying breath talk about an access violation at some address. If so, you may have noticed that while the address given could be just about anything, it'll most often be either 0x00000000 or something very low like 0x00000020. Those were caused by code trying to dereference a null pointer whether to access a field or call a virtual method (which is essentially accessing a field and then calling depending on what you get).

Now, since the first 64k or memory is always protected, dereferencing a null pointer will always result in the second case (access violation) rather than the first case (arbitrary memory being mis-used and resulting in bizarre "fandango on the core" bugs).

This is all exactly the same with .NET (or rather, with the jitted code produced by it), but if (A) the access violation happened at an address lower than 0x00010000 and (B) such a violation is found to have happened by code that was jitted, then it is turned into a NullReferenceException, otherwise it gets turned into an AccessViolationException.

We can simulate this with code that doesn't dereference, but which does access protected memory (we'll only read, so if we should happen to accidentally hit memory that isn't protected, the result won't be too weird!):

The following code will raise an AccessViolationException:

unsafe
{
  int read = *((int*)long.MaxValue - 8);
}

The following code will raise a NullReferenceException:

unsafe
{
  int read = *((int*)8);
}

Neither code is actually dereferencing anything. Both cause access violations, but the CLR assumes the later was probably caused by a null reference (in fairness, by far the most likely scenario) and raises it.

So, we can see how field access and callvirt can cause this.

It's worth noting now that because of a decision to not allow C# to call methods on null references even when safe to do so, callvirt is used as the IL for the majority of cases in C#, and the only exceptions would be cases of static methods or where it can be shown at compile time to not be on a null reference. (Edit: There are a few other cases where the compiler can see that a callvirt can be replaced by a call, even when the method actually is virtual [if the compiler can tell which overload would be hit] and the later compilers will do this slightly more often, though it will still use callvirt more often than you might imagine).

An interesting case is where optimisation means that a method called with callvirt could be inlined, but where it isn't known at compile-time to be guaranteed non-null. In such a case a field access may be added before the place where where the "call" (that isn't really a call) happens, precisely to trigger the NullReferenceException at the start, rather than in the middle, of the method. This means the optimisation does not change the observed behaviour.



回答2:

The MS implementation, IIRC, does this via an access violation. Null is essentially a zero reference, and basically: they deliberately reserve that address space and leave this page unmapped. The memory access violation is raised at the CPU/OS level automatically (i.e. without needing extra code to do a null check), and the CLI then reports this as a null-reference exception.

Interestingly, because memory is handled in pages, you can actually simulate (if you try hard enough) a null-reference exception on a non-zero but low value, for the same reasons.

Edit: Eric Lippert discusses this on this related question/answer: https://stackoverflow.com/a/8681563



回答3:

Have you read the CLI Spec - ECMA-335? You will find some answers there.

11 Semantics of classes...When a variable or field that has a class as its type is created (for example, by calling a method that has a local variable of a class type), the value shall initially be null, a special value that := with all class types even though it is not an instance of any particular class.

And the description of the ldnull instruction:

The ldnull pushes a null reference (type O) on the stack. This is used to initialize locations before they become live or when they become dead. [Rationale: It might be thought that ldnull is redundant: why not use ldc.i4.0 or ldc.i8.0 instead? The answer is that ldnull provides a size-agnostic null – analogous to an ldc.i instruction, which does not exist. However, even if CIL were to include an ldc.i instruction it would still benefit verification algorithms to retain the ldnull instruction because it makes type tracking easier. end rationale] Verifiability: The ldnull instruction is always verifiable, and produces a value of the null type (§1.8.1.2) that is assignable-to (§I.8.7.3)any other reference type.