I am trying to load the address of 'main' into a register (R10) in the GNU Assembler. I am unable to. Here I what I have and the error message I receive.
main:
lea main, %r10
I also tried the following syntax (this time using mov)
main:
movq $main, %r10
With both of the above I get the following error:
/usr/bin/ld: /tmp/ccxZ8pWr.o: relocation R_X86_64_32S against symbol `main' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
Compiling with -fPIC does not resolve the issue and just gives me the same exact error.
In x86-64, most immediates and displacements are still 32-bits because 64-bit would waste too much code size (I-cache footprint and fetch/decode bandwidth).
lea main, %reg
is an absolute disp32
addressing mode which would stop load-time address randomization (ASLR) from choosing a random 64-bit (or 47-bit) address so it's not supported on Linux outside of position-dependent executables, or at all on MacOS. (See the x86 tag wiki for links to docs and guides.) On Windows, you can build executables as "large address aware" or not. If you choose not, addresses will fit in 32 bits.
The standard efficient way to put a static address into a register is a RIP-relative LEA:
# Use this, works everywhere
lea main(%rip), %r10 # 7 bytes
lea r10, [rip+main] # GAS .intel_syntax noprefix equivalent
lea r10, [rel main] # NASM equivalent, or use default rel
See How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for an explanation of the 3 syntaxes.
This uses a 32-bit relative displacement from the end of the current instruction, like jmp
/call
. This can reach any static data in .data
, .bss
, .rodata
, or function in .text
, assuming the usual 2GiB total size limit for static code+data.
In position dependent code (built with gcc -fno-pie -no-pie
for example) on Linux, you can take advantage of 32-bit absolute addressing to save code size. Also, mov r32, imm32
has slightly better throughput than RIP-relative LEA on Intel/AMD CPUs, so out-of-order execution may be able to overlap it better with the surrounding code. (Optimizing for code-size is usually less important than most other things, but when all else is equal pick the shorter instruction. In this case all else is at least equal, or also better with mov imm32
.)
See 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about how PIE executables are the default. (Which is why you got a link error about -fPIC
with your use of a 32-bit absolute.)
# in a non-PIE executable, mov imm32 into a 32-bit register is even better
mov $main, %r10d # 6 bytes
mov $main, %edi # 5 bytes: no REX prefix needed for a "legacy" register
Note that writing any 32-bit register always zero-extends into the full 64-bit register (R10 and RDI).
lea main, %edi
or lea main, %rdi
would also work in a Linux non-PIE executable, but never use LEA with a [disp32]
absolute addressing mode (even in 32-bit code where that doesn't require a SIB byte); mov
is always at least as good.
The operand-size suffix is redundant when you have a register operand that uniquely determines it; I prefer to just write mov
instead of movl
or movq
.
The stupid/bad way is a 10-byte 64-bit absolute address as an immediate:
# Inefficient, DON'T USE
movabs $main, %r10 # 10 bytes including the 64-bit absolute address
This is what you get in NASM if you use mov rdi, main
instead of mov edi, main
so many people end up doing this. Linux dynamic linking does actually support runtime fixups for 64-bit absolute addresses. But the use-case for that is for jump tables, not for absolute addresses as immediates.
movq $sign_extended_imm32, %reg
(7 bytes) still uses a 32-bit absolute address, but wastes code bytes on a sign-extended mov
to a 64-bit register, instead of implicit zero-extension to 64-bit from writing a 32-bit register.
By using movq
, you're telling GAS you want a R_X86_64_32S
relocation instead of a R_X86_64_64
64-bit absolute relocation.
The only reason you'd ever want this encoding is for kernel code where static addresses are in the upper 2GiB of 64-bit virtual address space, instead of the lower 2GiB. mov
has slight performance advantages over lea
on some CPUs (e.g. running on more ports), but normally if you can use a 32-bit absolute it's in the low 2GiB of virtual address space where a mov r32, imm32
works.
PS: I intentionally left out any discussion of "large" or "huge" memory / code models, where RIP-relative +-2GiB addressing can't reach static data, or maybe can't even reach other code addresses. The above is for x86-64 System V ABI's "small" and/or "small-PIC" code models. You may need movabs $imm64
for medium and large models, but that's very rare.
I don't know if mov $imm32, %r32
works in Windows x64 executables or DLLs with runtime fixups, but RIP-relative LEA certainly does.
Semi-related: Call an absolute pointer in x86 machine code - if you're JITing, try to put the JIT buffer near existing code so you can call rel32
, otherwise movabs
a pointer into a register.