I was recently asked in an interview about object layout with virtual functions and multiple inheritance involved.
I explained it in context of how it is implemented without multiple inheritance involved (i.e. how the compiler generated the virtual table, insert a secret pointer to the virtual table in each object and so on).
It seemed to me that there was something missing in my explanation.
So here are questions (see example below)
- What is the exact memory layout of the object of class C.
- Virtual tables entries for class C.
- Sizes (as returned by sizeof) of object of classes A, B and C. (8, 8, 16 ?? )
- What if virtual inheritance is used. Surely the sizes and virtual table entries should be affected ?
Example code:
class A {
public:
virtual int funA();
private:
int a;
};
class B {
public:
virtual int funB();
private:
int b;
};
class C : public A, public B {
private:
int c;
};
Thanks!
1 thing to expect with multiple inheritance is that your pointer can change when casting to a (typically not first) subclass. Something you should be aware of while debugging and answering interview questions.
The memory layout and the vtable layout depend on your compiler. Using my gcc for instance, they look like this:
Note that sizeof(int) and the space needed for the vtable pointer can also vary from compiler to compiler and platform to platform. The reason why sizeof(C) == 20 and not 16 is that gcc gives it 8 bytes for the A subobject, 8 bytes for the B subobject and 4 bytes for its member
int c
.Using virtual inheritance
the layout changes to
Using gcc, you can add
-fdump-class-hierarchy
to obtain this information.First, a polymorphic class has at least one virtual function, so it has a vptr:
is compiled to:
Remark: C++ can be compiled to another high level language like C (as cfront did) or even to a C++ subset (here C++ without
virtual
). I put__
in compiler generated names.Note that this is a simplistic model where RTTI is not supported; real compilers will add data in the vtable to support
typeid
.Now, a simple derived class:
Non-virtual (*) base class subobjects are subobjects like member subobjects, but while member subobjects are complete objects, ie. their real (dynamic) type is their declared type, base class subobjects are not complete, and their real type change during construction.
(*) virtual bases are very different, like virtual member functions are different from non virtual members
Here "first position" means that the member must be first (other members could be re-ordered): they are located at offset zero so we can
reinterpret_cast
pointers, the types are compatibles; at non-zero offset, we would have to do pointer adjustments with arithmetic onchar*
.The lack of adjustment may not seem like a big deal in term of generated code (just some add immediate asm instructions), but it means much more than that, it means that such pointers can be viewed as having different types: an object of type
A__vtable*
can contain a pointer toDer__vtable
and be treated as either aDer__vtable*
or aA__vtable*
. The same pointer object serves as a pointer to aA__vtable
in functions dealing with objects of typeA
and as a pointer to aDer__vtable
in functions dealing with objects of typeDer
.You see that the dynamic type, as defined by the vptr, changes during construction as we assign a new value to the vptr (in this particular case the call to the base class constructor does nothing useful and can be optimised away, but it isn't the case with non trivial constructors).
With multiple inheritance:
A
C
instance will contain aA
and aB
, like that:Note that only one of these base class subobjects can have the privilege of sitting at offset zero; this is important in many ways:
conversion of pointers to other base classes (upcasts) will need an adjustment; conversely, upcasts need the opposite adjustments;
this implies that when doing a virtual call with a base class pointer, the
this
has the correct value for entry in the derived class overrider.So the following code:
can be compiled to
We see the
C__B__printaddr
declared type and semantics are compatibles withB__printaddr
, so we can use&C__B__printaddr
in the vtable ofB
;C__printaddr
is not compatible but can be used for calls involving aC
objects, or classes derived fromC
.A non virtual member function is like a free function that has access to internal stuff. A virtual member function is "flexibility point" which can be customised by overriding. virtual member function declaration play a special role in the definition of a class: like other members they are part of the contract with the external world, but at the same time they are part of a contract with derived class.
A non virtual base class is like a member object where we can refine behaviour via overriding (also we can access protected members). For the external world, the inheritance for
A
inDer
implies that implicit derived-to-base conversions will exist for pointers, that aA&
can be bound to aDer
lvalue, etc. For further derived classes (derived fromDer
), it also means that virtual functions ofA
are inherited in theDer
: virtual functions inA
can be overridden in further derived classes.When a class is further derived, say
Der2
is derived fromDer
, implicit conversions a pointers of typeDer2*
toA*
is semantically performed in step: first, a conversion toDer*
is validated (the access control to the inheritance relation ofDer2
fromDer
is checked with the usual public/protected/private/friend rules), then the access control ofDer
toA
. A non virtual inheritance relation cannot be refined or overridden in derived classes.Non virtual members functions can be directly called and virtual members must be called indirectly via the vtable (unless the real object type happens to be known by the compiler), so the
virtual
keyword adds an indirection to members functions access. Just like for function members, thevirtual
keyword adds an indirection to base object access; just like for functions, virtual base classes add a flexibility point in inheritance.When doing non-virtual, repeated, multiple inheritance :
There are just two
Top::i
subobjects inBottom
(Left::i
andRight::i
), as with members objects:Nobody is surprised that there are two
int
sub-members (l.t.i
andr.t.i
).With virtual functions:
it means there are two different (unrelated) virtual functions called
foo
, with distinct vtable entries (both as they have the same signature, they can have a common overrider).The semantic of non virtual base classes follows from the fact that basic, non virtual, inheritance is an exclusive relation: the inheritance relation established between Left and Top cannot be modified by a further derivation, so the fact that a similar relation exists between
Right
andTop
cannot affect this relation. In particular, it means thatLeft::Top::foo()
can be overridden inLeft
and inBottom
, butRight
, which has no inheritance relation withLeft::Top
, cannot set this customisation point.Virtual base classes are different: a virtual inheritance is a shared relation that can be customised in derived classes:
Here, this is only one base class subobject
Top
, only oneint
member.Implementation:
Room for non virtual base classes are allocated based on a static layout with fixed offsets in the derived class. Note that the layout of a derived class is the included in the layout of more derived class, so the exact position of subobjects does not depend on the real (dynamic) type of object (just like the address of a non virtual function is a constant). OTOH, the position of subobjects in a class with virtual inheritance is determined by the dynamic type (just like the address of the implementation of a virtual function is known only when the dynamic type is known).
The location of subobject will be determined at runtime with the vptr and the vtable (reuse of the existing vptr implies less space overhead), or a direct internal pointer to the subobject (more overhead, less indirections needed).
Because the offset of a virtual base class is determined only for a complete object, and cannot be known for a given declared type, a virtual base cannot be allocated at offset zero and is never a primary base. A derived class will never reuse the vptr of a virtual base as its own vptr.
In term of possible translation:
For an object of known type, access to the base class is through
vLeft__complete
:is translated to:
Here the real (dynamic) type of
r.m
is known and so is the relative position of the subobject is known at compile time. But here:the real (dynamic) type of
r
is not known, so the access is through the vptr:This function can accept any derived class with a different layout:
Note that the
vLeft
base class is at a fixed location in avBottom__subobject
, sovBottom__subobject.__ptr
is used as a vptr for the wholevBottom
.Semantics:
The inheritance relation is shared by all derived classes; this means that the right to override is shared, so
vRight
can overridevLeft::foo
. This creates a sharing of responsibilities:vLeft
andvRight
must agree on how they customiseTop
:Here we see a conflict:
vLeft
andvRight
seek to define the behaviour of the only foo virtual function, andvBottom
definition is in error for lack of a common overrider.Implementation:
The construction of class with non virtual base classes with non virtual base classes involves calling base class constructors in the same order as done for member variables, changing the dynamic type each time we enter a ctor. During construction, the base class subobjects really act as if they were complete objects (this is even true with impossible complete abstract base class subobjects: they are objects with undefined (pure) virtual functions). Virtual functions and RTTI can be called during construction (except of course pure virtual functions).
The construction of a class with non virtual base classes with virtual bases is more complicated: during construction, the dynamic type is the base class type, but the layout of virtual base is still the layout of the most derived type that is not yet constructed, so we need more vtables to describe this state:
The virtual functions are those of
vLeft
(during construction, the vBottom object lifetime has not begun), while the virtual base locations are those of avBottom
(as defined in thevBottom__complete
translated objected).Semantics:
During initialisation, it is obvious that we must be careful not to use an object before it is initialised. Because C++ gives us a name before an object is fully initialised, it is easy to do that:
or with the this pointer in the constructor:
It is pretty obvious that any use of
this
in the ctor-init-list must be carefully checked. After initialisation of all members,this
can be passed to other functions and registered in some set (until destruction begins).What is less obvious is that when construction of a class involving shared virtual bases, subobjects stop being constructed: during the construction of a
vBottom
:first the virtual bases are constructed: when
Top
is constructed, it is constructed like a normal subject (Top
doesn't even know it is virtual base)then the base classes are constructed in left to right order: the
vLeft
subobject is constructed and becomes functional as a normalvLeft
(but with avBottom
layout), so theTop
base class subobject now has avLeft
dynamic type;the
vRight
subobject construction begins, and the dynamic type of the base class changes to vRight; butvRight
isn't derived fromvLeft
, doesn't know anything aboutvLeft
, so thevLeft
base is now broken;when the body of the
Bottom
constructor begins, the types of all subobjects have stabilised andvLeft
is functional again.I am not sure how this answer can be taken as a complete answer without the mention of alignment or padding bits.
Let me give a bit background of Alignment:
"A memory address a, is said to be n-byte aligned when a is a multiple of n bytes (where n is a power of 2). In this context a byte is the smallest unit of memory access, i.e. each memory address specifies a different byte. An n-byte aligned address would have log2(n) least-significant zeros when expressed in binary.
The alternate wording b-bit aligned designates a b/8 byte aligned address (ex. 64-bit aligned is 8 bytes aligned).
A memory access is said to be aligned when the datum being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.
A memory pointer that refers to primitive data that is n bytes long is said to be aligned if it is only allowed to contain addresses that are n-byte aligned, otherwise it is said to be unaligned. A memory pointer that refers to a data aggregate (a data structure or array) is aligned if (and only if) each primitive datum in the aggregate is aligned.
Note that the definitions above assume that each primitive datum is a power of two bytes long. When this is not the case (as with 80-bit floating-point on x86) the context influences the conditions where the datum is considered aligned or not.
Data structures can be stored in memory on the stack with a static size known as bounded or on the heap with a dynamic size known as unbounded." - from Wiki...
To maintain the alignment, compiler inserts padding bits in the compiled code of a structure/class object. " Although the compiler (or interpreter) normally allocates individual data items on aligned boundaries, data structures often have members with different alignment requirements. To maintain proper alignment the translator normally inserts additional unnamed data members so that each member is properly aligned. In addition the data structure as a whole may be padded with a final unnamed member. This allows each member of an array of structures to be properly aligned. .... ....
Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure" - Wiki
To get more info about how GCC does it, please look at
http://www.delorie.com/gnu/docs/gcc/gccint_111.html
and search for the text "basic-align"
Now lets come to this problem:
Using the example class, i have created this program for a GCC compiler running on a 64 bit Ubuntu.
And the result for this program is as the following:
Now let me explain it. As both A & B have virtual functions, they will create seperate VTABLEs and VPTR will be added at the beginning of their objects, respectively.
Hence object of class A will have a VPTR (pointing to the VTABLE of A) and an int. The pointer will be 8 byte long and the int will be 4 byte long. Hence before compile the size is 12 bytes. But the compiler will add extra 4 bytes at the end of int a as padding bits. Hence after compilation, the A's objects size will be 12+4 = 16.
Similarly for class B's objects.
Now object of C will have two VPTRs (one for each class A & class B) and 3 ints (a,b,c). So the size should have been 8 (VPTR A) + 4 (int a) + 4 (padding bytes) + 8 (VPTR B) + 4 (int b) + 4 (int c) = 32 bytes. So the total size of C will be 32 bytes.