pay attention to this code :
#include <stdio.h>
void a(int a, int b, int c)
{
char buffer1[5];
char buffer2[10];
}
int main()
{
a(1,2,3);
}
after that :
gcc -S a.c
that command shows our source code in assembly.
now we can see in the main function, we never use "push" command to push the arguments of the a function into the stack. and it used "movel" instead of that
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $3, 8(%esp)
movl $2, 4(%esp)
movl $1, (%esp)
call a
leave
why does it happen? what's difference between them?
Is this on OS X by any chance? I read somewhere that it requires the stack pointer to be aligned at 16-byte boundaries. That could possibly explain this kind of code generation.
I found the article: http://blogs.embarcadero.com/eboling/2009/05/20/5607
The Pentium instruction set doesn't have an instruction for pushing a constant onto the stack. So using
push
would be slow: the program would have to put the constant in a register and push the register:So the compiler detects that using
movl
is faster. I guess you can try calling your function with a variable instead of a constant:gcc does all sorts of optimizations, including selecting instructions based upon execution speed of the particular CPU being optimized for. You will notice that things like
x *= n
is often replaced by a mix of SHL, ADD and/or SUB, especially when n is a constant; while MUL is only used when the average runtime (and cache/etc. footprints) of the combination of SHL-ADD-SUB would exceed that of MUL, orn
is not a constant (and thus using loops with shl-add-sub would come costlier).In case of function arguments: MOV can be parallelized by hardware, while PUSH cannot. (The second PUSH has to wait for the first PUSH to finish because of the update of the esp register.) In case of function arguments, MOVs can be run in parallel.
That code is just directly putting the constants (1, 2, 3) at offset positions from the (updated) stack pointer (esp). The compiler is choosing to do the "push" manually with the same result.
"push" both sets the data and updates the stack pointer. In this case, the compiler is reducing that to only one update of the stack pointer (vs. three). An interesting experiment would be to try changing function "a" to take only one argument, and see if the instruction pattern changes.
Here is what the gcc manual has to say about it:
Apparently
-maccumulate-outgoing-args
is enabled by default, overriding-mpush-args
. Explicitly compiling with-mno-accumulate-outgoing-args
does revert to thePUSH
method, here.