I found this:
Because the stack is cleaned by the called function, the __stdcall
calling convention creates smaller executables than __cdecl, in which
the code for stack cleanup must be generated for each function call.
Suppose I got 2 functions:
void __cdecl func1(int x)
{
//do some stuff using x
}
void __stdcall func2(int x, int y)
{
//do some stuff using x, y
}
and here in the main()
:
int main()
{
func1(5);
func2(5, 6);
}
IMO, it is main()
's responsibility to clean up the stack of the call to func1(5)
, and func2
will clean up the stack of the call to func2(5,6)
, right?
Four questions:
1.For the call to func1
in main()
, it's main
's responsibility to clean up the stack, so will compiler insert some code (code to clean up the stack) before and after the call to func
? Like this:
int main()
{
before_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func1(5);
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func2(5, 6);
}
2.For the call to func2
in main()
, it's func2
's own job to clean up the stack, so I presume, no code will be inserted in main()
before or after the call to func2
, right?
3.Because func2
is __stdcall
, so I presume, compiler will automatically insert code (to clean up the stack) like this:
void __stdcall func1(int x, int y)
{
before_call_to_stdcall_func(); //compiler generated code for stack-clean-up of stdcall-func-call
//do some stuff using x, y
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of stdcall-func-call
}
I presume right?
4.Finally, back to the quoted words, why __stdcall
results in smaller executable than __cdecl
? And there is no such a thing as __stdcall
in linux, right? Does it means linux elf will be always larger than exe in win?
- It'll only insert code after the call, which is to reset the stack pointer, so long as there where call arguments.*
__stdcall
generates no cleanup code at the call site, however, it should be noted that compilers can accrue stack cleanup from multiple __cdecl
calls into one cleanup, or it can delay the cleanup to prevent pipeline stalls.
- Ignoring the inverted order in this example, no, it'll only insert code to cleanup the
__cdecl
function, setting up of function arguments is something different (different compilers generate/prefer different methods).
__stdcall
was more a windows thing, see this. the size of the binary depends on the number of calls to the __cdecl
funcs, more calls means more clean up code, where as __stdcall
has only 1 singular instance of cleanup code. however, you shouldn't see that much size increase, as at most you have a few bytes per call.
*Its important to distinguish between cleanup and setting up call parameters.
Historically, the first C++ compilers used the equivalent of
__stdcall
. From a quality of implementation point of view, I'd expect
the C compiler to use the __cdecl
convensions, and the C++ compiler
the __stdcall
(which were known as the Pascal convensions back then).
This is one thing that the early Zortech compiles got right.
Of course, vararg functions must still use __cdecl
conventions. The
callee can't clean up the stack if it doesn't know how much to clean up.
(Note that the C standard was carefully designed to allow the
__stdcall
conventions in C as well. I only know of one compiler which
took advantage of this, however; the amount of existing code at the time
which called vararg functions without a prototype in view was enormous,
and while the standard declared it broken, compiler implementors didn't
want to break their clients' code.)
In a lot of milieu, there seems to be a very strong tendency to insist
that the C and the C++ conventions be the same, that one can take the
address of an extern "C++"
function, and pass it to a function written
in C which calls it. IIRC, for example, g++ doesn't treat
extern "C" void f();
and
void f();
as having two different types (although the standard requires it), and
allows passing the address of a static member function to
pthread_create
, for example. The result is that such compilers use
the exact same conventions everywhere, and on Intel, they are the
equivalent of __cdecl
.
Many compilers have extensions to support other convensions. (Why they
don't use the standard extern "xxx"
, I don't know.) The syntax for
these extensions is very varied, however. Microsoft puts the attribute
directly before the function name:
void __stdcall func( int, int );
, g++ puts it in a special attribute clause after the function
declaration:
void func( int, int ) __attribute__((stdcall));
The C++11 has added a standard way of specifying attributes:
void [[stdcall]] func( int, int );
It doesn't specify stdcall
as an attribute, but it does specify that
additional attributes (other than those defined in the standard) may be
specified, and are implementation dependent. I expect that both g++ and
VC++ accept this syntax in their most recent versions, at least if C++11
is activated. The exact name of the attribute (__stdcall
, stdcall
,
etc.) may vary, however, so you probably want to wrap this in a macro.
Finally: in a modern compiler with optimization turned on, the
difference in the calling conventions is probably negligible.
Attributes like const
(not to be confused with the C++ keyword
const
), regparm
or noreturn
will probably have a larger impact,
both in terms of executable size and performance.
This calling convention crowd is history by the new 64-bit ABI.
http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions
There is also the ABI side of things for different architectures. (like ARM)
Not everything executes the same for all architectures. So do not bother thinking about this calling convention thing !
http://en.wikipedia.org/wiki/Calling_convention
EXE size improvement is insignificant (maybe nonexistent), do not bother...
__cdecl
is much more flexible than __stdcall
. Variable number of arguments flexibility, the insignificance of cleanup code (instruction), __cdecl
function can be called with wrong number of arguments and this does not necessarily cause a serious problem ! But the same situation with __stdcall
always goes wrong !
Others have answered the other parts of your question, so I'll just add my answer about the size:
4.Finally, back to the quoted words, why __stdcall results in smaller executable than __cdecl?
That appears to not be true. I tested it by compiling libudis with and without the stdcall calling convention. First without:
$ clang -target i386-pc-win32 -DHAVE_CONFIG_H -Os -I.. -I/usr/include -fPIC -c *.c && strip *.o
$ du -cb *.o
6524 decode.o
95932 itab.o
1434 syn-att.o
1706 syn-intel.o
2288 syn.o
1245 udis86.o
109129 totalt
And with. It is the -mrtd
switch that enables stdcall:
$ clang -target i386-pc-win32 -DHAVE_CONFIG_H -Os -I.. -I/usr/include -fPIC -mrtd -c *.c && strip *.o
7084 decode.o
95932 itab.o
1502 syn-att.o
1778 syn-intel.o
2296 syn.o
1305 udis86.o
109897 totalt
As you can see, cdecl beats stdcall with a few hundred bytes. It could be my testing methodology that is flawed, or clang's stdcall code generator is weak. But I think that with modern compilers the extra flexibility afforded by caller cleanup means that they will always generate better code with cdecl rather than stdcall.