I write empty programs to annoy the hell out of stackoverflow coders, NOT. I am just exploring the gnu toolchain.
Now the following might be too deep for me, but to continuie the empty program saga I have started to examine the output of the C compiler, the stuff GNU as consumes.
gcc version 4.4.0 (TDM-1 mingw32)
test.c:
int main()
{
return 0;
}
gcc -S test.c
.file "test.c"
.def ___main; .scl 2; .type 32; .endef
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
movl $0, %eax
leave
ret
Can you explain what happens here? Here is my effort to understand it. I have used the as
manual and my minimal x86 ASM knowledge:
.file "test.c"
is the directive for the logical filename..def
: according to the docs "Begin defining debugging information for a symbol name". What is a symbol (a function name/variable?) and what kind of debugging information?.scl
: docs say "Storage class may flag whether a symbol is static or external". Is this the same static and external I know from C? And what is that '2'?.type
: stores the parameter "as the type attribute of a symbol table entry", I have no clue..endef
: no problem..text
: Now this is problematic, it seems to be something called section and I have read that its the place for code, but the docs didn't tell me too much..globl
"makes the symbol visible to ld.", the manual is quite clear on this._main:
This might be the starting address (?) for my main functionpushl_
: A long (32bit) push, which places EBP on the stackmovl
: 32-bit move. Pseudo-C:EBP = ESP;
andl
: Logical AND. Pseudo-C:ESP = -16 & ESP
, I don't really see whats the point of this.call
: Pushes the IP to the stack (so the called procedure can find its way back) and continues where__main
is. (what is __main?)movl
: this zero must be the constant I return at the end of my code. The MOV places this zero into EAX.leave
: restores stack after an ENTER instruction (?). Why?ret
: goes back to the instruction address that is saved on the stack
Thank you for your help!
Further to the
andl $-16,%esp
, this works because setting the low bits to zero will always adjust%esp
down in value, and the stack grows downward on x86.Commands starting with . are directives to the assembler. This just says this is "file.c", that information can be exported to the debugging information of the exe.
.def directives defines a debugging symbol. scl 2 means storage class 2(external storage class) .type 32 says this sumbol is a function. These numbers will be defined by the pe-coff exe-format
___main is a function called that takes care of bootstrapping that gcc needs(it'll do things like run c++ static initializers and other housekeeping needed).
Begins a text section - code lives here.
defines the _main symbol as global, which will make it visible to the linker and to other modules that's linked in.
Same thing as _main , creates debugging symbols stating that _main is a function. This can be used by debuggers.
Starts a new label(It'll end up an address). the .globl directive above makes this address visible to other entities.
Saves the old frame pointer(ebp register) on the stack (so it can be put back in place when this function ends)
Moves the stack pointer to the ebp register. ebp is often called the frame pointer, it points at the top of the stack values within the current "frame"(function usually), (referring to variables on the stack via ebp can help debuggers)
Ands the stack with fffffff0 which effectivly aligns it on a 16 byte boundary. Access to aligned values on the stack are much faster than if they were unaligned. All these preceding instructions are pretty much a standard function prologue.
Calls the ___main function which will do initializing stuff that gcc needs. Call will push the current instruction pointer on the stack and jump to the address of ___main
move 0 to the eax register,(the 0 in return 0;) the eax register is used to hold function return values for the stdcall calling convention.
The leave instruction is pretty much shorthand for
i.e. it "undos" the stuff done at the start of the function - restoring the frame pointer and stack to its former state.
Returns to whoever called this function. It'll pop the instruction pointer from the stack (which a corresponding call instruction will have placed there) and jump there.
There's a very similar exercise outlined here: http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax
You've figured out most of it -- I'll just make additional notes for emphasis and additions.
__main
is a subroutine in the GNU standard library that takes care of various start-up initialization. It is not strictly necessary for C programs but is required just in case the C code is linking with C++._main
is your main subroutine. As both_main
and__main
are code locations they have the same storage class and type. I've not yet dug up the definitions for.scl
and.type
yet. You may get some illumination by defining a few global variables.The first three instructions are setting up a stack frame which is a technical term for the working storage of a subroutine -- local and temporary variables for the most part. Pushing
ebp
saves the base of the caller's stack frame. Puttingesp
intoebp
sets the base of our stack frame. Theandl
aligns the stack frame to a 16 byte boundary just in case any local variables on the stack require 16 byte alignment (for the x86 SIMD instructions require that alignment, but alignment does speed up ordinary types such asint
s andfloat
s.At this point you'd normally expect
esp
to get moved down in memory to allocate stack space for local variables. Yourmain
has none so gcc doesn't bother.The call to
__main
is special to the main entry point and won't typically appear in subroutines.The rest goes as you surmised. Register
eax
is the place to put integer return codes in the binary spec.leave
undoes the stack frame andret
goes back to the caller. In this case, the caller is the low-level C runtime which will do additional magic (like callingatexit()
functions, set the exit code for the process and ask the operating system to terminate the process.I don't have all answers but I can explain what I know.
ebp
is used by the function to store the initial state ofesp
during its flow, a reference to where are the arguments passed to the function and where are its own local variables. The first thing a function does is to save the status of the givenebp
doingpushl %ebp
, it is vital to the function that make the call, and than replaces it by its own current stack positionesp
doingmovl %esp, %ebp
. Zeroing the last 4 bits ofebp
at this point is GCC specific, I don't know why this compiler does that. It would work without doing it. Now finally we go into business,call ___main
, who is __main? I don't know either... maybe more GCC specific procedures, and finally the only thing your main() does, set return value as 0 withmovl $0, %eax
andleave
which is the same as doingmovl %ebp, %esp; popl %ebp
to restoreebp
state, thenret
to finish.ret
popseip
and continue thread flow from that point, wherever it is (as its the main(), this ret probably leads to some kernel procedure which handles the end of the program).Most of it is all about managing the stack. I wrote a detailed tutorial about how stack is used some time ago, it would be useful to explain why all those things are made. But its in portuguese...
Regarding that andl $-16,%esp
So it will mask off the last 4 bits of ESP (btw: 2**4 equals to 16) and will retain all other bits (no matter if the target system is 32 or 64 bits).