I think I've known the answer for a class, just want to confirm my understanding is correct. Let's say I have a ClassA
and its instance named a
. When a.MethodA()
is invoked:
(1) CLR find the type of ClassA
by the type pointer of a
in the heap(the type have been loaded into the heap)
(2) Find the MethodA
in the type, if not found, go to its base type, until the object
class.
Maybe my understanding is not quite precise, but I think it's basicly correct(Correct me if it's wrong!). And here comes the question of a simple struct.
struct MyStruct
{
public void MethodA() { }
}
I have var x = new MyStruct();
, its value is on the stack, and the type of MyStruct
has been loaded into the heap. When execute x.MethodA()
, of course no boxing. How CLR find MethodA
and get the IL and execute/JIT it? I think the answer is probably:(again, correct me if I'm wrong)
(1) we have the declaring type of x
on the stack. CLR find its type by the info on the stack, and find MethodA
in its type. -- let's call it assumptionA
.
I'll be happy if you tell me my assumptionA
is correct. But even it's wrong, it tells a truth: CLR has a way to find a struct's type without boxing.
Now what about x.ToString()
or x.GetType()
? We know that the value will be boxed, and then it will perform like a class. But why do we need boxing here? Since we can get its type(assumptionA tells us), why not go to its base type and find the method(just like a class)? Why need an expensive box operations here?
AssumptionA is wrong. The C# compiler's symbol table stores type information. That static type information is used in nearly all cases, the dynamic type stored in an object is only needed during type checks (is
operator), casting (as
operator and actual cast syntax), and array variance, and then only when the dynamic type isn't known to the compiler. The dynamic type of an unboxed struct is always statically known, and dynamic type of a class instance is statically known near the instantiation and inside a conditional block which performed a type check (e.g. in if (x is T) y = (T)x;
the type is known inside the then-part, so the cast doesn't require another dynamic check).
Ok, now because the C# compiler statically knows the type of x
, it can do overload resolution and find the exact MethodA being called. Then it emits MSIL to push the arguments onto the MSIL virtual stack and issues a call instruction containing a metadata reference to that particular method. No type checks are needed at runtime.
For x.ToString()
, the C# compiler still knows the exact method it wants to call. If ToString
has been overridden by the struct
type, it expects a parameter of type pointer-to-MyStruct
, which the compiler handles without boxing. If ToString
has not been overridden, the compiler generates a call to Object.ToString
, which expects an object as its parameter. To push x
on the MSIL virtual stack as the correct type requires boxing.
GetType
is a special case when the type is known statically, the compiler won't call any method, it just gets the type information from the symbol table and stuffs the metadata reference into the MSIL directly.
Well, there's a few different things going on here:
For methods that are defined in the struct, the CLR just takes a look at the type definition in the assembly metadata when it's being loaded in order to figure out what the methods are, and when a method Foo
calls MethodA
, the CLR just binds to the correct method when MethodA
is JIT'd. There's nothing else actually happening after the compilation has all taken place; the method is called directly, because any information that's needed is already present.
For virtual inherited struct methods like ToString
, there has to be boxing because virtual calls can only be called on Object
s, by design -- without boxing, there's no v-table to look into in order to figure out the resulting method. (The fact that the method call might be immediately after the boxing might allow for potential optimizations, but it's a long shot -- I doubt the JIT compiler does this.) apparently there's no boxing; I was wrong because I didn't notice that these methods are overridden. Indeed, for overridden methods, the compiler does perform the optimization by just calling the method directly, because there's no reason not to. (There are no virtual methods for value types that are not overridden, so that's not actually an issue here.)
For non-virtual struct methods that are inherited, the object needs to be boxed simply because the method is, by definition, being called on a reference type, not on a value type; there's no need to special-case this in the compiler, because I believe the JIT compiler can actually do optimizations (like avoiding boxing) when it's JIT'ing a method like GetType
(though someone please correct me if I'm wrong about this optimization thing).
EDIT: Thanks for comments. I thought that I understand how it works... not anymore. So I'll leave this as starting point for investigation, but not an answer.
There could be boxing for calling ToString or other virtual functions on structs because there is no need for v-table lookup. Structs are sealed, so exact method is known and compile time.
On other hand as pointed in comments virtual functions from base class need Object as "this" parameter.
On third hand looking at generated IL it is unclear if ToString and GetHashCode actually do boxing (likly it is hidden somewhere since there is comment about boxing in these cases here http://blogs.msdn.com/b/lucabol/archive/2007/12/24/creating-an-immutable-value-object-in-c-part-iii-using-a-struct.aspx). GetType definietly requires explicit boxing.
Looking at output of ILDasm to see if there is boxing or direct call:
int v = 42;
string s = v.ToString();
object a = v;
s = a.ToString();
Get compiled (debug) into following IL. There is no boxing for int.ToString(), but definitely one for casting to object...
IL_0001: ldc.i4.s 42
IL_0003: stloc.0
IL_0004: ldloca.s v
IL_0006: call instance string [mscorlib]System.Int32::ToString()
IL_000b: stloc.1
IL_0013: ldloc.0
IL_0014: box [mscorlib]System.Int32
IL_0019: stloc.2
IL_001a: ldloc.2
IL_001b: callvirt instance string [mscorlib]System.Object::ToString()
IL_0020: stloc.1