Why do we need “this pointer adjustor thunk”?

2020-02-10 07:26发布

问题:

I read about adjustor thunk from here. Here's some quotation:

Now, there is only one QueryInterface method, but there are two entries, one for each vtable. Remember that each function in a vtable receives the corresponding interface pointer as its "this" parameter. That's just fine for QueryInterface (1); its interface pointer is the same as the object's interface pointer. But that's bad news for QueryInterface (2), since its interface pointer is q, not p.

This is where the adjustor thunks come in.

I am wondering why "each function in a vtable receives the corresponding interface pointer as its "this" parameter"? Is it the only clue(base address) used by the interface method to locate data members within the object instance?

Update

Here is my latest understanding:

In fact, my question is not about the purpose of this parameter, but about why we have to use the corresponding interface pointer as the this parameter. Sorry for my vagueness.

Besides using the interface pointer as a locator/foothold within an object's layout. There're of course other means to do that, as long as you are the implementer of the component.

But this is not the case for the clients of our component.

When the component is built in COM way, clients of our component know nothing about the internals of our component. Clients can only take hold of the interface pointer, and this is the very pointer that will be passed into the interface method as the this parameter. Under this expectation, the compiler has no choice but to generate the interface method's code based on this specific this pointer.

So the above reasoning leads to the result that:

it must be assured that each function in a vtable must recieve the corresponding interface pointer as its "this" parameter.

In the case of "this pointer adjustor thunk", 2 different entries exist for a single QueryInterface() method, in other words, 2 different interface pointers could be used to invoke the QueryInterface() method, but the compiler only generate 1 copy of QueryInterface() method. So if one of the interfaces is chosen by the compiler as the this pointer, we need to adjust the other to the chosen one. This is what the this adjustor thunk is born for.

BTW-1, what if the compiler can generate 2 different instances of QueryInterface() method? Each one based on the corresponding interface pointer. This won't need the adjustor thunk, but it would take more space to store the extra but similar code.

BTW-2: it seems that sometimes a question lacks a reasonable explanation from the implementer's point of view, but could be better understood from the user's pointer of view.

回答1:

Taking away the COM part from the question, the this pointer adjustor thunk is a piece of code that makes sure that each function gets a this pointer pointing to the subobject of the concrete type. The issue comes up with multiple inheritance, where the base and derived objects are not aligned.

Consider the following code:

struct base {
   int value;
   virtual void foo() { std::cout << value << std::endl; }
   virtual void bar() { std::cout << value << std::endl; }
};
struct offset {
   char space[10];
};
struct derived : offset, base {
   int dvalue;
   virtual void foo() { std::cout << value << "," << dvalue << std::endl; }
};

(And disregard the lack of initialization). The base sub object in derived is not aligned with the start of the object, as there is a offset in between[1]. When a pointer to derived is casted to a pointer to base (including implicit casts, but not reinterpret casts that would cause UB and potential death) the value of the pointer is offsetted so that (void*)d != (void*)((base*)d) for an assumed object d of type derived.

Now condider the usage:

derived d;
base * b = &d; // This generates an offset
b->bar();
b->foo();

The issue comes when a function is called from a base pointer or reference. If the virtual dispatch mechanism finds that the final overrider is in base, then the pointer this must refer to the base object, as in b->bar, where the implicit this pointer is the same address stored in b. Now if the final overrider is in a derived class, as with b->foo() the this pointer has to be aligned with the beginning of the sub object of the type where the final overrider is found (in this case derived).

What the compiler does is creating an intermediate piece of code. When the virtual dispatch mechanism is called, and before dispatching to derived::foo the intermediate call takes the this pointer and substracts the offset to the beginning of the derived object. This operation is the same as a downcast static_cast<derived*>(this). Remember that at this point, the this pointer is of type base, so it was initially offsetted, and this effectively returns the original value &d.

[1]There is an offset even in the case of interfaces --in the Java/C# sense: classes defining only virtual methods-- as they need to store a table to that interface's vtable.



回答2:

Here's an article on MSVC internals from one of the designers. It explains that and many other details of MSVC's implementation. You might also want to check my article on OpenRCE on how it all looks in assembly.



回答3:

Is it the only clue(base address) used by the interface method to locate data members within the object instance?

Yes, that's really all there is to it.



回答4:

Yes, this is essential for finding where the object start is. You write in your code:

variable = 10;

where variable is the member variable. First of all, which object does it belong to? It belongs to the object pointed to by this pointer. So it's actually

this->variable = 10;

now C++ needs to generate code that will actuall do the job - copy data. In order to do that it needs to know the offset between the object start and the member variable. The convention is that this always points onto the object start, so the offset can be constant:

*(reinterpret_cast<int*>( reinterpret_cast<char*>( this ) + variableOffset ) ) = 10; //assuming variable is of type int


回答5:

I think that it is important to point that in C++ there is no such entity as "interface pointer" or anything close to that. It is an idiom at best built on the concept of a restricted abstract class but still remaining a class. As a such all rules applying to class members and handling 'this' still apply unchanged. So principally an interface class must behave like stand-alone classes of given type regardless of their function and eventual inheritance hierarchy.

We can use the virtual method call mechanism to get to the actual (dynamic type) of the object exposed by an (interface) base class. How it is done is implementation specific including such concepts like the Virtual Method Table and "adjustor thunks". Generally compiler may use its initial 'this' pointer to locate the VMT and then the actual implementation of a given function and call it with eventual adjustment of the 'this' pointer. The thunk adjustment generally is needed to perform the final call if the memory layout of the base class is different than a derived one to which reference we hold like in the case of multiple inheritance.



标签: c++ com