I have a function foo
written in assembly and compiled with yasm and GCC on Linux (Ubuntu) 64-bit. It simply prints a message to stdout using puts()
, here is how it looks:
bits 64
extern puts
global foo
section .data
message:
db 'foo() called', 0
section .text
foo:
push rbp
mov rbp, rsp
lea rdi, [rel message]
call puts
pop rbp
ret
It is called by a C program compiled with GCC:
extern void foo();
int main() {
foo();
return 0;
}
Build commands:
yasm -f elf64 foo_64_unix.asm
gcc -c foo_main.c -o foo_main.o
gcc foo_64_unix.o foo_main.o -o foo
./foo
Here is the problem:
When running the program it prints an error message and immediately segfaults during the call to puts
:
./foo: Symbol `puts' causes overflow in R_X86_64_PC32 relocation
Segmentation fault
After disassembling with objdump I see that the call is made with the wrong address:
0000000000000660 <foo>:
660: 90 nop
661: 55 push %rbp
662: 48 89 e5 mov %rsp,%rbp
665: 48 8d 3d a4 09 20 00 lea 0x2009a4(%rip),%rdi
66c: e8 00 00 00 00 callq 671 <foo+0x11> <-- here
671: 5d pop %rbp
672: c3 retq
(671 is the address of the next instruction, not address of puts
)
However, if I rewrite the same code in C the call is done differently:
645: e8 c6 fe ff ff callq 510 <puts@plt>
i.e. it references puts
from the PLT.
Is it possible to tell yasm to generate similar code?
The
0xe8
opcode is followed by a signed offset to be applied to the PC (which has advanced to the next instruction by that time) to compute the branch target. Henceobjdump
is interpreting the branch target as0x671
.YASM is rendering zeros because it has likely put a relocation on that offset, which is how it asks the loader to populate the correct offset for
puts
during loading. The loader is encountering an overflow when computing the relocation, which may indicate thatputs
is at a further offset from your call than can be represented in a 32-bit signed offset. Hence the loader fails to fix this instruction, and you get a crash.66c: e8 00 00 00 00
shows the unpopulated address. If you look in your relocation table, you should see a relocation on0x66d
. It is not uncommon for the assembler to populate addresses/offsets with relocations as all zeros.This page suggests that YASM has a
WRT
directive that can control use of.got
,.plt
, etc.Per S9.2.5 on the NASM documentation, it looks like you can use
CALL puts WRT ..plt
(presuming YASM has the same syntax).Your gcc is building PIE executables by default (32-bit absolute addresses no longer allowed in x86-64 Linux?).
I'm not sure why, but when doing so the linker doesn't automatically resolve
call puts
tocall puts@plt
. There is still aputs
PLT entry generated, but thecall
doesn't go there.At runtime, the dynamic linker tries to resolve
puts
directly to the libc symbol of that name and fixup thecall rel32
. But the symbol is more than +-2^31 away, so we get a warning about overflow of theR_X86_64_PC32
relocation. The low 32 bits of the target address are correct, but the upper bits aren't. (Thus yourcall
jumps to a bad address).Your code works for me if I build with
gcc -no-pie -fno-pie call-lib.c libcall.o
. The-no-pie
is the critical part: it's the linker option. Your YASM command doesn't have to change.When making a traditional position-dependent executable, the linker turns the
puts
symbol for the call target intoputs@plt
for you, because we're linking a dynamic executable (instead of statically linking libc withgcc -static -fno-pie
, in which case thecall
could go directly to the libc function.)Anyway, this is why gcc emits
call puts@plt
(GAS syntax) when compiling with-fpie
(the default on your desktop, but not the default on https://godbolt.org/), but justcall puts
when compiling with-fno-pie
.See What does @plt mean here? for more about the PLT, and also Sorry state of dynamic libraries on Linux from a few years ago. (The modern
gcc -fno-plt
is like one of the ideas in that blog post.)BTW, a more accurate/specific prototype would let gcc avoid zeroing EAX before calling
foo
:extern void foo();
in C meansextern void foo(...);
You could declare it as
extern void foo(void);
, which is what()
means in C++. C++ doesn't allow function declarations that leave the args unspecified.asm improvements
You can also put
message
insection .rodata
(read-only data, linked as part of the text segment).You don't need a stack frame, just something to align the stack by 16 before a call. A dummy
push rax
will do it.Or we can tail-call
puts
by jumping to it instead of calling it, with the same stack position as on entry to this function. This works with or without PIE. Just replacecall
withjmp
, as long as RSP is pointing at your own return address.If you want to make PIE executables, you have two options
call puts wrt ..plt
- explicitly call through the PLT.call [rel puts wrt ..got]
- explicitly do an indirect call through the GOT entry, like gcc's-fno-plt
style of code-gen. (Using a RIP-relative addressing mode to reach the GOT, hence therel
keyword).WRT = With Respect To. The NASM manual documents
wrt ..plt
, and see also section 7.9.3: special symbols and WRT.Normally you would use
default rel
at the top of your file so you can actually usecall [puts wrt ..got]
and still get a RIP-relative addressing mode. You can't use a 32-bit absolute addressing mode in PIE or PIC code.call [puts wrt ..got]
assembles to a memory-indirect call using the function pointer that dynamic linking stored in the GOT. (Early-binding, not lazy dynamic linking.)NASM documents
..got
for getting the address of variables in section 9.2.3. Functions in (other) libraries are identical: you get a pointer from the GOT instead of calling directly, because the offset isn't a link-time constant and might not fit in 32-bits.YASM also accepts
call [puts wrt ..GOTPCREL]
, like AT&T syntaxcall *puts@GOTPCREL(%rip)
, but NASM does not.In a position-dependent executable, you can use
mov edi, message
instead of a RIP-relative LEA. It's smaller code-size and can run on more execution ports on most CPUs.In a non-PIE executable, you also might as well use
call puts
orjmp puts
and let the linker sort it out, unless you want more efficient no-plt style dynamic linking. But if you do choose to statically link libc, I think this is the only way you'll get a direct jmp to the libc function.(I think the possibility of static linking for non-PIE is why
ld
is willing to generate PLT stubs automatically for non-PIE, but not for PIE or shared libraries. It requires you to say what you mean when linking ELF shared objects.)If you did use
call puts
in a PIE (call rel32
), it could only work if you statically linked a position-independent implementation ofputs
into your PIE, so the entire thing was one executable that would get loaded at a random address at runtime (by the usual dynamic-linker mechanism), but simply didn't have a dependency onlibc.so.6