I have this short hello world program:
#include <stdio.h>
static const char* msg = "Hello world";
int main(){
printf("%s\n", msg);
return 0;
}
I compiled it into the following assembly code with gcc:
.file "hello_world.c"
.section .rodata
.LC0:
.string "Hello world"
.data
.align 4
.type msg, @object
.size msg, 4
msg:
.long .LC0
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
movl msg, %eax
movl %eax, (%esp)
call puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",@progbits
My question is: are all parts of this code essential if I were to write this program in assembly (instead of writing it in C and then compiling to assembly)? I understand the assembly instructions but there are certain pieces I don't understand. For instance, I don't know what .cfi* is, and I'm wondering if I would need to include this to write this program in assembly.
related: How to remove "noise" from GCC/clang assembly output? The
.cfi
directives are not directly useful to you, and the program would work without them. (It's stack-unwind info needed for exception handling and backtraces, so-fomit-frame-pointer
can be enabled by default. And yes, gcc emits this even for C.)As far as the number of asm source lines needed to produce a value Hello World program, obviously we want to use libc functions to do more work for us.
@Zwol's answer has the shortest implementation of your original C code.
Here's what you could do by hand, if you don't care about the exit status of your program, just that it prints your string.
The equivalent C (you just asked for the shortest Hello World, not one that had identical semantics):
Its exit status is undefined, but it definitely prints.
puts(3)
returns "a non-negative number", which could be outside the 0..255 range, so we can't say anything about the program's exit status being 0 / non-zero in Linux (where the process's exit status is the low 8 bits of the integer passed to theexit_group()
system call (in this case by the CRT startup code that called main()).Using JMP to implement the tail-call is a standard practice, and commonly used when a function doesn't need to do anything after another function returns.
puts()
will eventually return to the function that calledmain()
, just like if puts() had returned to main() and then main() had returned. main()'s caller still has to deal with the args it put on the stack for main(), because they're still there (but modified, and we're allowed to do that).gcc and clang don't generate code that modifies arg-passing space on the stack. It is perfectly safe and ABI-compliant, though: functions "own" their args on the stack, even if they were
const
. If you call a function, you can't assume that the args you put on the stack are still there. To make another call with the same or similar args, you need to store them all again.Also note that this calls
puts()
with the same stack alignment that we had on entry tomain()
, so again we're ABI-compliant in preserving the 16B alignment required by modern version of the x86-32 aka i386 System V ABI (used by Linux)..string
zero-terminates strings, same as.asciz
, but I had to look it up to check. I'd recommend just using.ascii
or.asciz
to make sure you're clear on whether your data has a terminating byte or not. (You don't need one if you use it with explicit-length functions likewrite()
)In the x86-64 System V ABI (and Windows), args are passed in registers. This makes tail-call optimization a lot easier, because you can rearrange args or pass more args (as long as you don't run out of registers). This makes compilers willing to do it in practice. (Because as I said, they currently don't generate code that modifies the incoming arg space on the stack, even though the ABI is clear that they're allowed to, and compiler generated functions assume that functions clobber their stack args.)
clang or gcc -O3 will do this optimization for x86-64, as you can see on the Godbolt compiler explorer:
Static data addresses always fit in the low 31 bits of address-space, and executable don't need position-independent code, otherwise the
mov
would belea .LC0(%rip), %rdi
. (You'll get this from gcc if it was configured with--enable-default-pie
to make position-independent executables.)Hello World using 32-bit x86 Linux system calls directly, no libc
I originally wrote this for SO Docs (topic ID: 1164, example ID: 19078), rewriting a basic less-well-commented example by @runner. It's in NASM syntax, so it's not a perfect fit for this question.
If you don't already know low-level Unix systems programming, you might want to just write functions in asm that take args and return a value (or update arrays via a pointer arg) and call them from C or C++ programs. Then you can just worry about learning how to handle registers and memory, without also learning the POSIX system-call API and the ABI for using it. That also makes it very easy to compare your code with compiler output for a C implementation. Compilers usually do a pretty good job at making efficient code, but are rarely perfect.
libc provides wrapper functions for system calls, so compiler-generated code would
call write
rather than invoking it directly withint 0x80
(or if you care about performance,sysenter
). (In x86-64 code, usesyscall
for the 64-bit ABI.) See alsosyscalls(2)
.System calls are documented in section 2 manual pages, like
write(2)
. See the NOTES section for differences between the libc wrapper function and the underlying Linux system call. Note that the wrapper forsys_exit
is_exit(2)
, not theexit(3)
ISO C function that flushes stdio buffers and other cleanup first. There's also anexit_group
system call that ends all threads.exit(3)
actually uses that, because there's no downside in a single-threaded process.This code makes 2 system calls:
sys_write(1, "Hello, World!\n", sizeof(...));
sys_exit(0);
I commented it heavily (to the point where it it's starting to obscure the actual code without color syntax highlighting). This is an attempt to point things out to total beginners, not how you should comment your code normally.
Notice that we don't store the string length in data memory anywhere. It's an assemble-time constant, so it's more efficient to have it as an immediate operand than a load. We could also have pushed the string data onto the stack with three
push imm32
instructions, but bloating the code-size too much isn't a good thing.On Linux, you can save this file as
Hello.asm
and build a 32-bit executable from it with these commands:See this answer for more details on building assembly into 32 or 64-bit static or dynamically linked Linux executables, for NASM/YASM syntax or GNU AT&T syntax with GNU
as
directives. (Key point: make sure to use-m32
or equivalent when building 32-bit code on a 64-bit host, or you will have confusing problems at run-time.)You can trace its execution with
strace
to see the system calls it makes:Compare this with the trace for a dynamically linked process (like gcc makes from hello.c, or from running
strace /bin/ls
) to get an idea just how much stuff happens under the hood for dynamic linking and C library startup.The trace on stderr and the regular output on stdout are both going to the terminal here, so they interfere in the line with the
write
system call. Redirect or trace to a file if you care. Notice how this lets us easily see the syscall return values without having to add code to print them, and is actually even easier than using a regular debugger (like gdb) to single-step and look ateax
for this. See the bottom of the x86 tag wiki for gdb asm tips. (The rest of the tag wiki is full of links to good resources.)The x86-64 version of this program would be extremely similar, passing the same args to the same system calls, just in different registers and with
syscall
instead ofint 0x80
. See the bottom of What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for a working example of writing a string and exiting in 64-bit code.related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.
The absolute bare minimum that will work on the platform that this appears to be, is
But this breaks a number of ABI requirements. The minimum for an ABI-compliant program is
Everything else in your object file is either the compiler not optimizing the code down as tightly as possible, or optional annotations to be written to the object file.
The
.cfi_*
directives, in particular, are optional annotations. They are necessary if and only if the function might be on the call stack when a C++ exception is thrown, but they are useful in any program from which you might want to extract a stack trace. If you are going to write nontrivial code by hand in assembly language, it will probably be worth learning how to write them. Unfortunately, they are very poorly documented; I am not currently finding anything that I think is worth linking to.The line
is also important to know about if you are writing assembly language by hand; it is another optional annotation, but a valuable one, because what it means is "nothing in this object file requires the stack to be executable." If all the object files in a program have this annotation, the kernel won't make the stack executable, which improves security a little bit.
(To indicate that you do need the stack to be executable, you put
"x"
instead of""
. GCC may do this if you use its "nested function" extension. (Don't do that.))It is probably worth mentioning that in the "AT&T" assembly syntax used (by default) by GCC and GNU binutils, there are three kinds of lines: A line with a single token on it, ending in a colon, is a label. (I don't remember the rules for what characters can appear in labels.) A line whose first token begins with a dot, and does not end in a colon, is some kind of directive to the assembler. Anything else is an assembly instruction.