Is Richter mistaken when describing the internals

2019-03-18 17:16发布

问题:

I would write this question directly to Jeffrey Richter, but last time he didn't answer me :) so I will try to get an answer with your help here, guys :)

In the book "CLR via C#", 3rd edition, on p.108, Jeffrey writes:

void M3() {
  Employee e;
  e = new Manager();
  year = e.GetYearsEmployed();
  ...
}

The next line of code in M3 calls Employee’s nonvirtual instance GetYearsEmployed method. When calling a nonvirtual instance method, the JIT compiler locates the type object that corresponds to the type of the variable being used to make the call. In this case, the variable e is defined as an Employee. (If the Employee type didn’t define the method being called, the JIT compiler walks down the class hierarchy toward Object looking for this method. It can do this because each type object has a field in it that refers to its base type; this information is not shown in the figures.) Then, the JIT compiler locates the entry in the type object’s method table that refers to the method being called, JITs the method (if necessary), and then calls the JITted code.

When I read this first time I thought that it would be not effective to walk along the class hierarchy looking for the method during JIT-ting. It is easy to find the method already on compile stage. But I believed to Jeffrey. I posted this information on another forum and another guy confirmed my doubts that it is strange and would be ineffective and that it seems it is wrong information.

And really, if you look for the corresponding IL code in a decompiler, such as ILDasm or Reflector (I've checked in both) you will see that IL has a callvirt instruction calling the method from the base class, so JIT doesn't need to look in which class the method is located at runtime:

public class EmployeeBase
{
    public int GetYearsEmployed() { return 1; }
}

public class Employee : EmployeeBase
{
    public void SomeOtherMethod() { }
}

public class Manager : Employee
{
    public void GenProgressReport() { }
}

...

Employee e;
e = new Manager();
int years = e.GetYearsEmployed();

Resulting IL is:

L_0000: nop 
L_0001: newobj instance void TestProj.Form1/Manager::.ctor()
L_0006: stloc.0 
L_0007: ldloc.0 
L_0008: callvirt instance int32 TestProj.Form1/EmployeeBase::GetYearsEmployed()

You see? Compiler already found out that the method is located not in the Employee class, but in the EmployeeBase class and emited a right call. But from Richter's words JIT would have to find out that the method is actually located in the EmployeeBase class at runtime.

Did Jeffrey Richter mistaken? Or I don't understand something?

回答1:

The C# compiler resolves non-virtual methods exactly with no wiggle room. If a derived non-virtual method with the same signature appearch after the caller was compiled, the CLR will still call the "fixed" method the C# compiler chose. This is to avoid the brittle base class problem.

If you want dynamic method resolution, use virtual. If you don't use virtual you get fully static resolution. Your choice. The runtime type of the object reference becoming the this pointer does not matter in resolution of non-virtual methods at all (neither for csc.exe not for the CLR JIT).

The JIT will always call the exactly chosen method. It will throw an exception if the method does not exist (maybe because the callee DLL was changed). It will not call a different method.

callvirt can also call non-virtual methods. It is used to perform a null check. It is defined that way, and C# is defined to perform a null check on every call.



回答2:

From my understanding, and using your example: Under the hood:

A VIRTUAL method in a base class WILL have an entry in a derived class method table. This means that all the virtual methods in the 'object' type are available in all their derived classes method table.

A NON virtual method (as in the example code), with no supplied functionality in the derived classes will NOT actually have an entry in the derived classes method tables!

To check this, I ran the code in WinDbg to examine the method table for the Manager class.

MethodDesc Table Entry MethodDe JIT Name

506a4960 503a6728 PreJIT System.Object.ToString()

50698790 503a6730 PreJIT System.Object.Equals(System.Object)

50698360 503a6750 PreJIT System.Object.GetHashCode()

506916f0 503a6764 PreJIT System.Object.Finalize()

001b00c8 00143904 JIT Manager..ctor()

0014c065 001438f8 NONE Manager.GenProgressReport()

So,I can see the virtual object methods of object, but I can't see the actual method GetYearsEmployed since it's not virtual and has no derived implementation. Incidentally, by the same concept, you can't see the SomeOtherMethod function in the derived class either.

You can, however, call these functions, it's just they are not there in the method table. I could be incorrect, but I believe the call stack is walked to find them. Maybe this is what Mr Richter means in his book. I find his book difficult to read but that's because the concepts are complicated and he is cleverer than me :)

I'm not sure the IL reflects the problem. I believe it's possibly a layer below IL which is why I've used Windbg to take a look. I suppose you could use windbg to see of it walks the stack....



回答3:

As answered by @usr in the similar question I posted How is non-virtual instance method inheritance resolved?:

Runtime usually means "when/everytime the code runs". The JIT resolution here is only involved once before the code runs. What the JIT does is not being referred to by saying "at runtime".

Also in Jeffrey's words

JIT compiler locates the type object that corresponds to the type of the variable being used to make the call.

The variable type here I believe means "the class specified by the metadata token" (ECMA 335 III.3.19 call) based on which JIT resolves the method destination.

C# compiler always figures out the correct method to call, and put that info into the metadata token. So JIT never has to "walk down the class hierarchy". (But it can if you manually change the metadata token to an inherited method)

    class A
    {
        public static void Foo() {Console.WriteLine(1); }
        public void Bar() { Console.WriteLine(2); }
    }
    class B : A {}
    class C : B {}

    static void Main()
    {
        C.Foo();
        new C().Bar(); 
        C x = new C();
        x.Bar();
        Console.ReadKey();
    }

IL_0000:  call       void ConsoleApplication5.Program/A::Foo() // change to B::Foo()
IL_0005:  newobj     instance void ConsoleApplication5.Program/C::.ctor()
IL_000a:  call       instance void ConsoleApplication5.Program/A::Bar() // change to B::Bar()
IL_000f:  newobj     instance void ConsoleApplication5.Program/C::.ctor()
IL_0014:  stloc.0
IL_0015:  ldloc.0
IL_0016:  callvirt   instance void ConsoleApplication5.Program/A::Bar() // change to B::Bar()
IL_001b:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
IL_0020:  pop
IL_0021:  ret

If we use Ildasm + Ilasm to change A::Foo() to B::Foo(), and to change A::Bar() to B.Bar(), the application runs fine.



标签: .net clr jit