Calling C function which takes no parameters with

2019-05-11 15:18发布

问题:

I have some weird question about probably undefined behavior between C calling convention and 64/32 bits compilation. First here is my code:

int f() { return 0; }

int main()
{
    int x = 42;
    return f(x);
}

As you can see I am calling f with an argument while f takes no parameters. My first question was does this argument is really given to f while calling it.

The mysterious lines

After a little objdump I obtained curious results. While passing x as argument of f:

00000000004004b6 <f>:
  4004b6:   55                      push   %rbp
  4004b7:   48 89 e5                mov    %rsp,%rbp
  4004ba:   b8 00 00 00 00          mov    $0x0,%eax
  4004bf:   5d                      pop    %rbp
  4004c0:   c3                      retq   

00000000004004c1 <main>:
  4004c1:   55                      push   %rbp
  4004c2:   48 89 e5                mov    %rsp,%rbp
  4004c5:   48 83 ec 10             sub    $0x10,%rsp
  4004c9:   c7 45 fc 2a 00 00 00    movl   $0x2a,-0x4(%rbp)
  4004d0:   8b 45 fc                mov    -0x4(%rbp),%eax
  4004d3:   89 c7                   mov    %eax,%edi
  4004d5:   b8 00 00 00 00          mov    $0x0,%eax
  4004da:   e8 d7 ff ff ff          callq  4004b6 <f>
  4004df:   c9                      leaveq 
  4004e0:   c3                      retq   
  4004e1:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4004e8:   00 00 00 
  4004eb:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Without passing x as a argument:

00000000004004b6 <f>:
  4004b6:   55                      push   %rbp
  4004b7:   48 89 e5                mov    %rsp,%rbp
  4004ba:   b8 00 00 00 00          mov    $0x0,%eax
  4004bf:   5d                      pop    %rbp
  4004c0:   c3                      retq   

00000000004004c1 <main>:
  4004c1:   55                      push   %rbp
  4004c2:   48 89 e5                mov    %rsp,%rbp
  4004c5:   48 83 ec 10             sub    $0x10,%rsp
  4004c9:   c7 45 fc 2a 00 00 00    movl   $0x2a,-0x4(%rbp)
  4004d0:   b8 00 00 00 00          mov    $0x0,%eax
  4004d5:   e8 dc ff ff ff          callq  4004b6 <f>
  4004da:   c9                      leaveq 
  4004db:   c3                      retq   
  4004dc:   0f 1f 40 00             nopl   0x0(%rax)

So as we can see:

  4004d0:   8b 45 fc                mov    -0x4(%rbp),%eax
  4004d3:   89 c7                   mov    %eax,%edi

happen when I call f with x but because I am not really good in assembly I don't really understand these lines.

The 64/32 bits paradoxe

Otherwise I tried something else and start printing the stack of my program.

Stack with x given to f (compiled in 64bits):

Address of x: ffcf115c
  ffcf1128:          0          0
  ffcf1130:   -3206820          0
  ffcf1138:   -3206808  134513826
  ffcf1140:         42   -3206820
  ffcf1148: -145495616  134513915
  ffcf1150:          1   -3206636
  ffcf1158:   -3206628         42
  ffcf1160: -143903780   -3206784

Stack with x not given to f (compiled in 64bits):

Address of x: 3c19183c
  3c191818:          0          0
  3c191820: 1008277568      32766
  3c191828:    4195766          0
  3c191830: 1008277792      32766
  3c191838:          0         42
  3c191840:    4195776          0

And for some reason in 32bits x seems to be push on the stack.

Stack with x given to f (compiled in 32bits):

Address of x: ffdc8eac
  ffdc8e78:          0          0
  ffdc8e80:   -2322772          0
  ffdc8e88:   -2322760  134513826
  ffdc8e90:         42   -2322772
  ffdc8e98: -145086016  134513915
  ffdc8ea0:          1   -2322588
  ffdc8ea8:   -2322580         42
  ffdc8eb0: -143494180   -2322736

Why the hell does x appear in 32 but not 64 ???

Code for printing: http://paste.awesom.eu/yayg/QYw6&ln

Why am I asking such stupid questions ?

  • First because I didn't found any standard that answer to my question
  • Secondly, think about calling a variadic function in C without the count of arguments given.
  • Last but not least, I think undefined behavior is fun.

Thank you for taking the time to read until here and for helping me understanding something or making me realize that my questions are pointless.

回答1:

The answer is that, as you suspect, what you are doing is undefined behavior (in the case where the superfluous argument is passed).

The actual behavior in many implementations is harmless, however. An argument is prepared on the stack, and is ignored by the called function. The called function is not responsible for removing arguments from the stack, so there no harm (such as an unbalanced stack pointer).

This harmless behavior was what enabled C hackers to develop, once upon a time, a variable argument list facility that used to be under #include <varargs.h> in ancient versions of the Unix C library.

This evolved into the ANSI C <stdarg.h>.

The idea was: pass extra arguments into a function, and then march through the stack dynamically to retrieve them.

That won't work today. For instance, as you can see, the parameter is not in fact put into the stack, but loaded into the RDI register. This is the convention used by GCC on x86-64. If you march through the stack, you won't find the first several parameters. On IA-32, GCC passes parameters using the stack, by contrast: though you can get register-based behavior with the "fastcall" convention.

The va_arg macro from <stdarg.h> will correctly take into account the mixed register/stack parameter passing convention. (Or, rather, when you use the correct declaration for a variadic function, it will perhaps suppress the passage of the trailing arguments in registers, so that va_arg can just march through memory.)

P.S. your machine code might be easier to follow if you added some optimization. For instance, the sequence

  4004c9:   c7 45 fc 2a 00 00 00    movl   $0x2a,-0x4(%rbp)
  4004d0:   8b 45 fc                mov    -0x4(%rbp),%eax
  4004d3:   89 c7                   mov    %eax,%edi
  4004d5:   b8 00 00 00 00          mov    $0x0,%eax

is fairly obtuse due to what look like some wasteful data moves.



回答2:

How arguments are passed to a function is dependent on the platform ABI (application binary interface). The ABI makes it possible to compile libraries with compiler X and use them with code compiled with compiler Y. None of this is defined by the standard.

There is no requirement by the standard that a "stack" even exist, much less that it be used for function calling.

The x86 chips had limited numbers of registers, and the ABI reflects that fact; the normal 32-bit x86 calling convention uses the stack for all arguments.

That is not the case with the 64-bit architecture, which has many more registers and uses some of them for the first few parameters. This significantly speeds up function calls.

Similarly, the Windows 32-bit "fastcall" calling convention passes a few arguments in registers. (In order to use a non-standard calling convention, you need to appropriately annotate the function declaration, and do so consistently where it is defined.)

You can find more information on various calling conventions in this Wikipedia article. The AMD64 ABI can be found on x86-64.org (PDF document). The original System V IA-32 ABI (the basis of the ABI used on Linux, xBSD and OS X) can still be accessed from www.sco.com (PDF document).


Undefined behaviour?

The code presented in the OP is definitely undefined behaviour.

  1. In a function definition, an empty parameter list means that the function does not take any arguments. In a function declaration, an empty parameter fails to declare how many arguments the function takes.

    §6.7.6.3/p.14: An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters. The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied.

  2. When the function is eventually called, it must be called with the correct number of parameters:

    §6.5.2.2/p.6: If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double... If the number of arguments does not equal the number of parameters, the behavior is undefined.

  3. If the function is defined as a vararg function (with a trailing ellipsis), the vararg declaration must be visible wherever the function is called.

    (Continuing from previous quote): If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined.