I found this:
Because the stack is cleaned by the called function, the __stdcall calling convention creates smaller executables than __cdecl, in which the code for stack cleanup must be generated for each function call.
Suppose I got 2 functions:
void __cdecl func1(int x)
{
//do some stuff using x
}
void __stdcall func2(int x, int y)
{
//do some stuff using x, y
}
and here in the main()
:
int main()
{
func1(5);
func2(5, 6);
}
IMO, it is main()
's responsibility to clean up the stack of the call to func1(5)
, and func2
will clean up the stack of the call to func2(5,6)
, right?
Four questions:
1.For the call to func1
in main()
, it's main
's responsibility to clean up the stack, so will compiler insert some code (code to clean up the stack) before and after the call to func
? Like this:
int main()
{
before_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func1(5);
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of cdecl-func-call
func2(5, 6);
}
2.For the call to func2
in main()
, it's func2
's own job to clean up the stack, so I presume, no code will be inserted in main()
before or after the call to func2
, right?
3.Because func2
is __stdcall
, so I presume, compiler will automatically insert code (to clean up the stack) like this:
void __stdcall func1(int x, int y)
{
before_call_to_stdcall_func(); //compiler generated code for stack-clean-up of stdcall-func-call
//do some stuff using x, y
after_call_to_cdecl_func(); //compiler generated code for stack-clean-up of stdcall-func-call
}
I presume right?
4.Finally, back to the quoted words, why __stdcall
results in smaller executable than __cdecl
? And there is no such a thing as __stdcall
in linux, right? Does it means linux elf will be always larger than exe in win?
Historically, the first C++ compilers used the equivalent of
__stdcall
. From a quality of implementation point of view, I'd expect the C compiler to use the__cdecl
convensions, and the C++ compiler the__stdcall
(which were known as the Pascal convensions back then). This is one thing that the early Zortech compiles got right.Of course, vararg functions must still use
__cdecl
conventions. The callee can't clean up the stack if it doesn't know how much to clean up.(Note that the C standard was carefully designed to allow the
__stdcall
conventions in C as well. I only know of one compiler which took advantage of this, however; the amount of existing code at the time which called vararg functions without a prototype in view was enormous, and while the standard declared it broken, compiler implementors didn't want to break their clients' code.)In a lot of milieu, there seems to be a very strong tendency to insist that the C and the C++ conventions be the same, that one can take the address of an
extern "C++"
function, and pass it to a function written in C which calls it. IIRC, for example, g++ doesn't treatand
as having two different types (although the standard requires it), and allows passing the address of a static member function to
pthread_create
, for example. The result is that such compilers use the exact same conventions everywhere, and on Intel, they are the equivalent of__cdecl
.Many compilers have extensions to support other convensions. (Why they don't use the standard
extern "xxx"
, I don't know.) The syntax for these extensions is very varied, however. Microsoft puts the attribute directly before the function name:, g++ puts it in a special attribute clause after the function declaration:
The C++11 has added a standard way of specifying attributes:
It doesn't specify
stdcall
as an attribute, but it does specify that additional attributes (other than those defined in the standard) may be specified, and are implementation dependent. I expect that both g++ and VC++ accept this syntax in their most recent versions, at least if C++11 is activated. The exact name of the attribute (__stdcall
,stdcall
, etc.) may vary, however, so you probably want to wrap this in a macro.Finally: in a modern compiler with optimization turned on, the difference in the calling conventions is probably negligible. Attributes like
const
(not to be confused with the C++ keywordconst
),regparm
ornoreturn
will probably have a larger impact, both in terms of executable size and performance.__stdcall
generates no cleanup code at the call site, however, it should be noted that compilers can accrue stack cleanup from multiple__cdecl
calls into one cleanup, or it can delay the cleanup to prevent pipeline stalls.__cdecl
function, setting up of function arguments is something different (different compilers generate/prefer different methods).__stdcall
was more a windows thing, see this. the size of the binary depends on the number of calls to the__cdecl
funcs, more calls means more clean up code, where as__stdcall
has only 1 singular instance of cleanup code. however, you shouldn't see that much size increase, as at most you have a few bytes per call.*Its important to distinguish between cleanup and setting up call parameters.
This calling convention crowd is history by the new 64-bit ABI.
http://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions
There is also the ABI side of things for different architectures. (like ARM) Not everything executes the same for all architectures. So do not bother thinking about this calling convention thing !
http://en.wikipedia.org/wiki/Calling_convention
EXE size improvement is insignificant (maybe nonexistent), do not bother...
__cdecl
is much more flexible than__stdcall
. Variable number of arguments flexibility, the insignificance of cleanup code (instruction),__cdecl
function can be called with wrong number of arguments and this does not necessarily cause a serious problem ! But the same situation with__stdcall
always goes wrong !Others have answered the other parts of your question, so I'll just add my answer about the size:
That appears to not be true. I tested it by compiling libudis with and without the stdcall calling convention. First without:
And with. It is the
-mrtd
switch that enables stdcall:As you can see, cdecl beats stdcall with a few hundred bytes. It could be my testing methodology that is flawed, or clang's stdcall code generator is weak. But I think that with modern compilers the extra flexibility afforded by caller cleanup means that they will always generate better code with cdecl rather than stdcall.