Consider the following simple shared library source code:
library.cpp:
static int global = 10;
int foo()
{
return global;
}
Compiled with -fPIC
option in clang, it results in this object assembly (x86-64):
foo(): # @foo()
push rbp
mov rbp, rsp
mov eax, dword ptr [rip + global]
pop rbp
ret
global:
.long 10 # 0xa
Since the symbol is defined inside the library, the compiler is using a PC relative addressing as expected: mov eax, dword ptr [rip + global]
However if we change static int global = 10;
to int global = 10;
making it a symbol with external linkage, the resulting assembly is:
foo(): # @foo()
push rbp
mov rbp, rsp
mov rax, qword ptr [rip + global@GOTPCREL]
mov eax, dword ptr [rax]
pop rbp
ret
global:
.long 10 # 0xa
As you can see the compiler added a layer of indirection with the Global Offset Table, which seems totally unnecessary in this case as the symbol is still defined inside the same library (and source file).
If the symbol was defined in another shared library, the GOT would be necessary, but in this case it feels redundant. Why is the compiler still adding this symbol to the GOT?
Note: I believe this question is similiar to this, however the answer was not pertinent maybe due to a lack of details.
In addition to details in Ross Ridge answer.
This is external vs internal linkage. Without
static
that variable has external linkage and is, therefore, accessible from any other translation unit. Any other translation unit can declare it asextern int global;
and access it.Linkage:
The Global Offset Table serves two purposes. One is to allow the dynamic linker "interpose" a different definition of the variable from the executable or other shared object. The second is to allow position independent code to be generated for references to variables on certain processor architectures.
ELF dynamic linking treats the entire process, the executable and all of the shared objects (dynamic libraries), as sharing one single global namespace. If multiple components (executable or shared objects) define the same global symbol then the dynamic linker normally chooses one definition of that symbol and all references to that symbol in all components refer to that one definition. (However, the ELF dynamic symbol resolution is complex and for various reasons different components can end up using different definitions of the the same global symbol.)
To implement this, when building a shared library the compiler will access global variables indirectly through the GOT. For each variable an entry in the GOT will be created containing a pointer to the variable. As your example code shows, the compiler will then use this entry to obtain the address of variable instead of trying to access it directly. When the shared object is loaded into a process the dynamic linker will determine whether any of the global variables have been superseded by variable definitions in another component. If so those global variables will have their GOT entries updated to point at the superseding variable.
By using the "hidden" or "protected" ELF visibility attributes it's possible to prevent global defined symbol from being superseded by a definition in another component, and thus removing the need to use the GOT on certain architectures. For example:
when compiled with
-O3 -fPIC
with the x86_64 port of GCC generates:As you can see, only
global_visible
uses the GOT,global_hidden
andlocal
don't use it. The "protected" visibility works similarly, it prevents the definition from being superseded but makes it still visible to the dynamic linker so it can be accessed by other components. The "hidden" visibility hides the symbol completely from the dynamic linker.The necessity of making code relocatable in order allow shared objects to be loaded a different addresses in different process means that statically allocated variables, whether they have global or local scope, can't be accessed with directly with a single instruction on most architectures. The only exception I know of is the 64-bit x86 architecture, as you see above. It supports memory operands that are both PC-relative and have large 32-bit displacements that can reach any variable defined in the same component.
On all the other architectures I'm familiar with accessing variables in position dependent manner requires multiple instructions. How exactly varies greatly by architecture, but it often involves using the GOT. For example, if you compile the example C code above with x86_64 port of GCC using the
-m32 -O3 -fPIC
options you get:The GOT is used for all three variable accesses, but if you look closely
global_hidden
andlocal
are handled differently thanglobal_visible
. With the later, a pointer to the variable is accessed through the GOT, with former two variables they're accessed directly through the GOT. This a fairly common trick among architectures where the GOT is used for all position independent variable references.The 32-bit x86 architecture is exceptional in one way here, since it has large 32-bit displacements and a 32-bit address space. This means that anywhere in memory can be accessed through the GOT base, not just the GOT itself. Most other architectures only support much smaller displacements, which makes the maximum distance something can be from the GOT base much smaller. Other architectures that use this trick will only put small (local/hidden/protected) variables in the GOT itself, large variables are stored outside the GOT and the GOT will contain a pointer to the variable just like with normal visibility global variables.