C++ Multiple Inheritance Memory Layout with “Empty

2019-02-13 13:06发布

问题:

I know the memory layout of multiple inheritance is not defined, so I should not rely on it. However, can I rely on it in a special case. That is, a class has only one "real" super class. All others are "empty classes", i.e., classes that neither have fields nor virtual methods (i.e. they only have non-virtual methods). In this case, these additional classes should not add anything to the memory layout of the class. (More concisely, in the C++11 wording, the class has standard-layout)

Can I infer that all the superclasses will have no offset? E.g.:

#include <iostream>

class X{

    int a;
    int b;
};

class I{};

class J{};

class Y : public I, public X,  public J{};

int main(){

    Y* y = new Y();
    X* x = y;
    I* i = y;
    J* j = y;

    std::cout << sizeof(Y) << std::endl 
                  << y << std::endl 
                  << x << std::endl 
                  << i << std::endl 
                  << j << std::endl;
}

Here, Y is the class with X being the only real base class. The output of the program (when compiled on linux with g++4.6) is as follows:

8

0x233f010

0x233f010

0x233f010

0x233f010

As I concluded, there is no pointer adjustment. But is this implementation specific or can I rely on it. I.e., if I receive an object of type I (and I know only these classes exist), can I use a reinterpret_cast to cast it to X?

My hopes are that that I could rely on it because the spec says that the size of an object must at least be a byte. Therefore, the compiler cannot choose another layout. If it would layout I and J behind the members of X, then their size would be zero (because they have no members). Therefore, the only reasonable choice is to align all super classes without offset.

Am I correct or am I playing with the fire if I use reinterpret_cast from I to X here?

回答1:

In C++11 the compiler is required to use the Empty Base-class Optimization for standard layout types. see https://stackoverflow.com/a/10789707/981959

For your specific example all the types are standard layout classes and don't have common base classes or members (see below) so you can rely on that behaviour in C++11 (and in practice, I think many compilers already followed that rule, certainly G++ did, and others following the Itanium C++ ABI.)

A caveat: make sure you don't have any base classes of the same type, because they must be at distinct addresses, e.g.

struct I {};

struct J : I {};
struct K : I { };

struct X { int i; };

struct Y : J, K, X { };

#include <iostream>

Y y;

int main()
{
  std::cout << &y << ' ' << &y.i << ' ' << (X*)&y << ' ' << (I*)(J*)&y << ' ' << (I*)(K*)&y << '\n';

}

prints:

0x600d60 0x600d60 0x600d60 0x600d60 0x600d61

For the type Y only one of the I bases can be at offset zero, so although the X sub-object is at offset zero (i.e. offsetof(Y, i) is zero) and one of the I bases is at the same address, but the other I base is (at least with G++ and Clang++) one byte into the object, so if you got an I* you couldn't reinterpret_cast to X* because you wouldn't know which I sub-object it pointed to, the I at offset 0 or the I at offset 1.

It's OK for the compiler to put the second I sub-object at offset 1 (i.e. inside the int) because I has no non-static data members, so you can't actually dereference or access anything at that address, only get a pointer to the object at that address. If you added non-static data members to I then Y would no longer be standard layout and would not have to use the EBO, and offsetof(Y, i) would no longer be zero.