What does “a GP/function address pair” mean in IA-

What does "a GP/function address pair" mean in Itanium C++ ABI? What does GP stand for?

回答1:

Short explanation: gp is, for all practical means, a hidden parameter to all functions that comply with the Itanium ABI. It's a kind of this pointer to the global variables the function uses. As far as I know, no mainstream OS does it anymore.

GP stands for "globals pointer". It's a base address for data statically allocated by executables, and the Itanium architecture has a register just for it.

For instance, if you had these global variables and this function in your program:

int foo;
int bar;
int baz;

int func()
{
    foo++;
    bar += foo;
    baz *= bar / foo;
    return foo + bar + baz;
}

The gp/function pair would conceptually be &foo, &func. The code generated for func would refer to gp to find where the globals are located. The compiler knows foo can be found at gp, bar can be found at gp + 4 and baz can be found at gp + 8.

Assuming func is defined in an external library, if you call it from your program, the compiler will use a sequence of instructions like this one:

save current gp value to the stack;
load code address from the pair for func into some register;
load gp value from same pair into GP;
perform indirect call to the register where we stored the code address;
restore old gp value that we saved on the stack before, resume calling function.

This makes executables fully position-independent since they don't ever store absolute addresses to data symbols, and therefore makes it possible to maintain only one instance of any executable file in memory, no matter how many processes use it (you could even load the same executable multiple times within a single process and still only have one copy of the executable code systemwide), at the cost of making function pointers a little weird. With the Itanium ABI, a function pointer is not a code address (like it is with "regular" x86 ABIs): it's an address to a gp value and a code address, since that code address might not be worth much if it can't access its global variables, just like a method might not be able to do much if it doesn't have a this pointer.

The only other ABI I know that uses this concept was the Mac OS Classic PowerPC ABI. They called those pairs "transition vectors".

Since x86_64 supports RIP-relative addressing (x86 did not have an equivalent EIP-relative addressing), it's now pretty easy to create position-independent code without having to use an additional register or having to use "enhanced" function pointers. Code and data just have to be kept at constant offsets. Therefore, this part of the Itanium ABI is probably gone for good on Intel platforms.

From the Itanium Register Conventions:

8.2 The gp Register

Every procedure that references statically-allocated data or calls another procedure requires a pointer to its data segment in the gp register, so that it can access its static data and its linkage tables. Each load module has its own data segment, and the gp register must be set correctly prior to calling any entry point within that load module.

The linkage conventions require that each load module define exactly one gp value to refer to a location within its short data segment. It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries. The DLL loader will determine the absolute value of the gp register for each load module after loading its data segment into memory.

For calls within a load module, the gp register will remain unchanged, so calls known to be local can be optimized accordingly.

For calls between load modules, the gp register must be initialized with the correct gp value for the new load module, and the calling function must ensure that its own gp value is saved and restored.

回答2:

Just a comment about this quote from the other answer:

It is expected that this location will be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries.

What this is talking about: Itanium has three different ways to put a value into a register (where 'immediate' here means 'offset from the base'). You can support a full 64 bit offset from anywhere, but it takes two instructions:

// r34 has base address
movl r33 = <my immediate>
;;
add r35 = r34, r35
;;

Not only does that take 2 separate clocks, it takes 3 instruction slots across 2 bundles to make that happen.

There are two shorter versions: add14 (also adds) and add22 (also addl). The difference was in the immediate size each could handle. Each took a single 'A' slot iirc, and completed in a single clock.

add14 could use any register as the source & target, but could only handle up to 14 bit immediates.

add22 could use any register as the target, but for source, only two bits were allocated. So you could only use r0, r1, r2, r3 as the source regs. r0 is not a real register - it's hardwired to 0. But using one of the other 3 as a local stack registers, means you can address 256 times the memory using simple offsets, compared to using the local stack registers. Therefore, if you put your global base address into r1 (the convention), you could access that much more local offsets before having to do a separate movl and/or modifying gp for the next section of code.