With some background in assemble instructions and C programs, I can visualize how a compiled function would look like, but it's funny I have never so carefully thought about how a compiled C++ class would look like.
bash$ cat class.cpp
#include<iostream>
class Base
{
int i;
float f;
};
bash$ g++ -c class.cpp
I ran:
bash$objdump -d class.o
bash$readelf -a class.o
but what I get is hard for me to understand.
Could somebody please explain me or suggest some good starting points.
The classes are (more or less) constructed as regular structs. The methods are (more or less...) converted into functions which first parameter is "this". References to the class variables are done as an offset to "this".
As far as inheritance, lets quote from the C++ FAQ LITE, which is mirrored here http://www.parashift.com/c++-faq-lite/virtual-functions.html#faq-20.4 . This chapter shows how Virtual functions are called in the real hardware (what does the compile make in machine code.
Let's work an example. Suppose class Base has 5 virtual functions: virt0()
through virt4()
.
// Your original C++ source code
class Base {
public:
virtual arbitrary_return_type virt0(...arbitrary params...);
virtual arbitrary_return_type virt1(...arbitrary params...);
virtual arbitrary_return_type virt2(...arbitrary params...);
virtual arbitrary_return_type virt3(...arbitrary params...);
virtual arbitrary_return_type virt4(...arbitrary params...);
...
};
Step #1: the compiler builds a static table containing 5 function-pointers, burying that table into static memory somewhere. Many (not all) compilers define this table while compiling the .cpp that defines Base's first non-inline virtual function. We call that table the v-table; let's pretend its technical name is Base::__vtable
. If a function pointer fits into one machine word on the target hardware platform, Base::__vtable
will end up consuming 5 hidden words of memory. Not 5 per instance, not 5 per function; just 5. It might look something like the following pseudo-code:
// Pseudo-code (not C++, not C) for a static table defined within file Base.cpp
// Pretend FunctionPtr is a generic pointer to a generic member function
// (Remember: this is pseudo-code, not C++ code)
FunctionPtr Base::__vtable[5] = {
&Base::virt0, &Base::virt1, &Base::virt2, &Base::virt3, &Base::virt4
};
Step #2: the compiler adds a hidden pointer (typically also a machine-word) to each object of class Base. This is called the v-pointer. Think of this hidden pointer as a hidden data member, as if the compiler rewrites your class to something like this:
// Your original C++ source code
class Base {
public:
...
FunctionPtr* __vptr; ← supplied by the compiler, hidden from the programmer
...
};
Step #3: the compiler initializes this->__vptr
within each constructor. The idea is to cause each object's v-pointer to point at its class's v-table, as if it adds the following instruction in each constructor's init-list:
Base::Base(...arbitrary params...)
: __vptr(&Base::__vtable[0]) ← supplied by the compiler, hidden from the programmer
...
{
...
}
Now let's work out a derived class. Suppose your C++ code defines class Der that inherits from class Base. The compiler repeats steps #1 and #3 (but not #2). In step #1, the compiler creates a hidden v-table, keeping the same function-pointers as in Base::__vtable
but replacing those slots that correspond to overrides. For instance, if Der overrides virt0()
through virt2()
and inherits the others as-is, Der's v-table might look something like this (pretend Der doesn't add any new virtuals):
// Pseudo-code (not C++, not C) for a static table defined within file Der.cpp
// Pretend FunctionPtr is a generic pointer to a generic member function
// (Remember: this is pseudo-code, not C++ code)
FunctionPtr Der::__vtable[5] = {
&Der::virt0, &Der::virt1, &Der::virt2, &Base::virt3, &Base::virt4
}; ^^^^----------^^^^---inherited as-is
In step #3, the compiler adds a similar pointer-assignment at the beginning of each of Der's constructors. The idea is to change each Der object's v-pointer so it points at its class's v-table. (This is not a second v-pointer; it's the same v-pointer that was defined in the base class, Base; remember, the compiler does not repeat step #2 in class Der.)
Finally, let's see how the compiler implements a call to a virtual function. Your code might look like this:
// Your original C++ code
void mycode(Base* p)
{
p->virt3();
}
The compiler has no idea whether this is going to call Base::virt3()
or Der::virt3()
or perhaps the virt3()
method of another derived class that doesn't even exist yet. It only knows for sure that you are calling virt3()
which happens to be the function in slot #3 of the v-table. It rewrites that call into something like this:
// Pseudo-code that the compiler generates from your C++
void mycode(Base* p)
{
p->__vptr[3](p);
}
I strongly recommend every C++ developer to read the FAQ. It might take several weeks (as it's hard to read and long) but it will teach you a lot about C++ and what can be done with it.
ok. there is nothing special with compiled classes. compiled classes even does not exists. what exist is objects wich are flat chunk of memory with possible paddings between fields? and standalone member functions somewhere in code which take pointer to an object as first parameter.
so object of class Base should be something
(*base_address) : i
(*base_address + sizeof(int)) : f
it is possible to have paddings between fields? but that is hardware specific. based on processors memory model.
also... in debug version it is possible to catch class description in debug symbols. but that is compiler specific. you should search for a program which dumps debug symbols for your compiler.
"Compiled classes" mean "compiled methods".
A method is an ordinary function with an extra parameter, usually put in a register (mostly %ecx I believe, this is at least true for most Windows compilers who have to produce COM objects using __thiscall convention).
So C++ classes are not terribly different from a bunch of ordinary functions, except for name mangling and some magic in constructors/destructors for setting up vtables.
The main difference from reading C object files is that the C++ method names are mangled. You may try to use option -C|--demangle
with objdump
.
Try the
g++ -S class.cpp
That will give you an assembly file 'class.s' (text file) which you can read with a text editor.
However, your code doesn't do anything (declaring a class doesn't generate code on its own) so you won't have much in the assembly file.
Like a C struct and a set of functions with an additional parameter that is a pointer to the struct.
The easiest way to follow what the compiler did perhaps is to build without optimisation, then load the code into a debugger and step through it in with mixed source/assembler mode.
However, the point of the compiler is that you don't need to know this stuff (unless perhaps you are writing a compiler).