For example the following code demonstrates my line of thought:
class Program
{
static void Main(string[] args)
{
int i = 0;
IsNull(i); // Works fine
string s = null;
IsNull(s); // Blows up
}
static void IsNull<T>(T obj)
{
if (obj == null)
throw new NullReferenceException();
}
}
Also the following code:
int i = 0;
bool b = i == null; // Always false
Is there an implicit object cast going on? such that:
int i = 0;
bool b = (object)i == null;
There's a value to both of the two answers here, and I would give the mark to phoog as answering the practical concern most people have when they ask about this (variants of it have come up before). But there is also an incompleteness.
There are four ways of looking at the code in question, and all four are important, and the answers have only looked at two (though phoog's entailed a lot about one other).
I'll start with the part of the question that was ignored so far:
Well, yes and no. It depends on the level we are looking at it, and we actually have to look at it at different levels at different times, so saying so is not mere pedantry.
C# is four things:
So far answers have looked at point 2 and 3, but the full picture looks at all four.
And the most important points are actually point 1 and 4.
Point 1 is important because C# is after all the language we are looking at, and the view colleagues are most likely to look at. Since programming is partly instructing a computer to do something, and partly expressing one's intent as one did so (medium- and high-level programming languages are for people first, computers second), the actual source code is important.
Point 4 is important because that is after all our final goal. It is not the same thing as looking at the assembly of the machine code (as phoog's answer did) because machine code is not the final answer as to what changes and optimisations are done:
Now, all that said, in the cases we're looking at now, the machine code is about as far as we need to look to reason about the machine's behaviour. In general though, machine code is not the final answer every time. Still, phoog's answer isn't a fault for implying rather than stating the impact here; I only mention it because I'm aiming to write about the different conceptual levels at which both phoog and xxbbcc are correct in different ways.
Coming back to our code of
bool b = i == null
wherei
is of typeint
.In C#
null
is defined as a literal value that is the default for all reference types, and for nullable value types. It can be compared with any value for reference equality - that is, the question "Are X and Y the same instance" can be asked withnull
as the value for X and the answer is true if Y is not an instance, and false otherwise.To make this comparison with a value type, we must box the value type, just as we must any of those cases where we need to treat a value type as a reference type.
If the value type is a nullable value type, and it is null (
HasValue
returns false), then boxing produces a null reference. In all other cases boxing a value type creates a reference to a new object on the heap, of typeobject
which refers to the same value and can be unboxed back to it.Therefore the answer at the conceptual level of C# is "yes, i is implicitly boxed to create a new object that is then compared to null [which hence will always return false]".
At the next level, we have CIL.
In CIL, null is a value with a natural word-size (32-bit in a 32-bit process, 64-bit in a 64-bit process) bit pattern of all-zero (hence
brfalse
,brzero
andbrnull
all just being aliases for the same bytecode) which is a valid value for managed pointers, pointers, natural integers and any other means to give an address).Also in CIL, boxing is done to an equivalent boxed type; it's not just
object
, butboxed type of int
,boxed type of float
, etc. This is hidden from C# because it's not very useful (you can't do anything with these types other than those things you can do onobject
and unbox back to the equivalent unboxed type), but is more precisely defined in CIL because it needs to do the implementation of "how can boxing be done on lots of different types?".The equivalent code in CIL would at a minimum be:
I say "at a minimum" as there might be some loading from and storing to the locals array for the method in question.
So, at the CIL level the answer is also "yes, i is implicitly boxed to create a new object that is then compared to null [which hence will always return false]".
However, this is not actually the CIL that would be produced. In a release build it would be:
That is, it will optimise the code that always produces false to code that just produces false. Even in a debug build we would likely have some optimisation.
But I wasn't lying when I said that in CIL the code for comparing an integer with null involves boxing; it does, but the C# compiler can see that this code is a waste of time, and just replaces it with code that loads false into
b
. Indeed, ifb
isn't used later on, it might just cut out the whole thing. (Conversely, ifi
is used later on, it will still load0
into it at some point, rather than cut it out as in the example above).This is the first time we've come up against compiler optimisation here, and it's time to examine just what that means.
Compiler optimisation comes down to a simple observation; if a piece of code can be rewritten as a different piece of code that has the same effects as seen from the outside, but is faster and/or uses less memory and/or results in a smaller executable, then only a moron will complain if you produced the faster/smaller/lighter version instead.
This simple observation becomes complicated by two things. The first is what to do when given the choice between a faster version and a lighter version. Some compilers give options for weighing these choices (most C++ compilers do), but C# does not. The other is what does "as seen from the outside" mean? It used to be simple "any output produced, interactions with other processes, or operations on volatile* variables". It gets a bit more complicated when you have multiple threads, one of which is performing garbage collection, all of which are of course "outside" of each other, in that this makes the number of cases where an optimisation (esp. if it involves reordering) could affect what is observed. Still, none of that applies here.
The C# compiler does not do a lot of optimisation, since the jitter is going to do a lot anyway, so the downside of optimisation (1. all work is a chance for a bug so if you don't do a particular optimisation you won't have a bug related to that optimisation. 2. the more you optimise something the more you can confuse the developer looking at it) becomes more significant if a given optimisation would be done by the next layer anyway.
Still, it does do that optimisation.
Indeed, it will optimise away whole sections. Take the code:
Compile it, then decompile it back into C# and you get:
Because the compiler can remove entire sections of code it knows will never be hit.
So, while in both C# and IL, comparing a value type to null involves boxing, the C# compiler will remove such pointless cruft and no boxing will actually happen. It will also issue warning CS0472, because if you put obviously pointless cruft in your code something was likely wrong with your thinking, and you should look at it and figure out what you really meant to do.
It's worth at this point also looking at what would happen if
i
was of typeint?
; which can be boxed to a null. There is still an optimisation made:HasValue
field. This is more efficient than boxing.(The matter of assembly is irrelevant at this stage, since the boxing and comparison has already been removed).
Now, if we have the case of a generic method (or method of a generic class) that accepts both value and reference type parameters, this optimisation cannot be done by the C# compiler, because generic methods aren't instantiated into their particular specialised form at compile time (unlike the otherwise similar C++ templates), but at jitting time.
For this reason, the IL produced will always include the boxing operation (unless there was another reason why it could be optimised away even in the case of reference types).
The jitter though, has much the same knowledge of the fact that boxing a non-nullable value type will never produce a null value, that the C# compiler did with our first example. It is also much more aggressive in optimisation than the C# compiler ever is.
This is where we get the behaviour that phoog described in their answer: In the code produced for a value-type type parameter, the boxing operation is completely removed (with a reference-type parameter the boxing operation is essentially a no-op and also removed). The check is removed, as the answer is known, and indeed entire sections of code that would be executed only if that check had returned true, are also removed.
The case phoog didn't examine is that of a nullable value type. Here, at a very minimum the boxing and comparison will be replaced with a call to
HasValue
, which in turn will be inlined to a read of the internal field in the struct. Possibly (if it's known that the value is never null, or if it's known that it's always null) that will be removed, along with one whole section of code that would never be executed anyway.Summary
There are two more specific questions behind your question, and you may be interested in one or both of them.
Question 1: I am interested in how C# functions as a language, and I want to know if as far as C# is concerned, comparing a non-nullable value-type with null boxes that value type.
Answer 1: Yes, a comparison with null can only be done with a reference type - including a boxed value type - and so there is always a boxing operation.
Question 2: I have generic code which compares a value with null, because I want to do something only if it's a reference type or nullable value type, and if the value is equal to null. Will my code pay the performance penalty of a boxing operation in the cases where the type compared is a value type?
Answer 2: No. In those cases where the C# compiler cannot optimise away the code from the IL it produces, the jitter still can. For non-nullable value types the entire boxing operation, comparison, and code-path only taken when the comparison with null returned true, will all be removed from the machine code produced, and thus from the work the computer does. Furthermore, if it's a nullable value type, the boxing and comparison will be replaced with an examination of the field in the value that indicates whether
HasValue
is true or not.*Note that this definition of
volatile
is related to, but not the same as, that in .NET, for reasons that are also related to how greater support for multi-threaded execution has complicated things from how they were in the 1960s.Yes,
obj
gets boxed by the compiler. This is the IL generated for yourIsNull
function:The
box
instruction is where the casting happens.The compiler doesn't know anything specific about
T
so it must assume that it must be anobject
- the base type of everything in .NET; this is why it boxesobj
to make sure that the null check can be performed. If you use a type constraint you can give more information to the compiler aboutT
.For example, if you use
where T : struct
yourIsNull
function won't compile anymore because the compiler knowsT
is a value type and null is not a value for value types.Boxing a value type instance always returns a valid (non-null) object instance*, so the
IsNull
function would never throw for a value type. This is actually correct behavior if you think about it: the numeric value0
is notnull
- a value type value cannot possibly benull
.In the code above
brtrue.s
is very much likeif(objref!=0)
- it doesn't check the value of the object (the value type value before boxing) because at the time of the check, it's not a value that's on top of the stack: it's the boxed object instance that's on top. Since that value (it's really a pointer) is non-null, the check fornull
never comes back as true.*Jon Hanna pointed out in a comment that this statement is not true for
default(Nullable<T>)
which is correct - boxing this value returnsnull
for anyT
.xxbbcc's answer assumes that the OP is asking "why isn't 0 equal to null", which may well be what the question is all about. On the other hand, in the context of generic types, questions about boxing often have to do with the performance benefit that generic types offer by avoiding boxing.
In considering that question, the IL could be misleading. It includes a box instruction, but that doesn't mean that a boxed instance of the value type will actually be allocated on the heap. The IL "boxes" the value because the IL code is also also generic; the substitution of type arguments for type parameters is the responsibility of the JIT compiler. For a non-nullable value type, the JIT compiler optimizes away the IL instructions for boxing and checking the result because it knows that the result will always be non-null.
I added a Thread.Sleep call to the sample code, to give time to attach the debugger. (If you start the debugger in Visual Studio with F5, certain optimizations are disabled even if it is a release build). Here's the machine code in Release build:
Note that the call instruction has a different target for the int and the string. Here they are:
and
Looks more or less the same, right? But here's what you get if you start the process first and then attach the debugger:
Not only has the optimizer removed the boxing of the value type, it has inlined the call to the IsNull method for the value type by removing it altogether. It's not obvious from the above machine code, but the call to IsNull for the reference type is also inlined. The
call 706C4988
instruction seems to be the NullReferenceException constructor, andcall 715E5170
seems to be thethrow
.