Why does printf print random value with float and

2020-02-10 08:38发布

问题:

I wrote a simple code on a 64 bit machine

int main() {
    printf("%d", 2.443);
}

So, this is how the compiler will behave. It will identify the second argument to be a double hence it will push 8 bytes on the stack or possibly just use registers across calls to access the variables. %d expects a 4 byte integer value, hence it prints some garbage value.

What is interesting is that the value printed changes everytime I execute this program. So what is happening? I expected it to print the same garbage value everytime not different ones everytime.

回答1:

It's undefined behaviour, of course, to pass arguments not corresponding to the format, so the language cannot tell us why the output changes. We must look at the implementation, what code it produces, and possibly the operating system too.

My setup is different from yours,

Linux 3.1.10-1.16-desktop x86_64 GNU/Linux (openSuSE 12.1)

with gcc-4.6.2. But it's similar enough that it's reasonable to suspect the same mechanisms.

Looking at the generated assembly (-O3, out of habit), the relevant part (main) is

.cfi_startproc
subq    $8, %rsp             # adjust stack pointer
.cfi_def_cfa_offset 16
movl    $.LC1, %edi          # move format string to edi
movl    $1, %eax             # move 1 to eax, seems to be the number of double arguments
movsd   .LC0(%rip), %xmm0    # move the double to the floating point register
call    printf
xorl    %eax, %eax           # clear eax (return 0)
addq    $8, %rsp             # adjust stack pointer
.cfi_def_cfa_offset 8
ret                          # return

If instead of the double, I pass an int, not much changes, but that significantly

movl    $47, %esi            # move int to esi
movl    $.LC0, %edi          # format string
xorl    %eax, %eax           # clear eax
call    printf

I have looked at the generated code for many variations of types and count of arguments passed to printf, and consistently, the first double (or promoted float) arguments are passed in xmmN, N = 0, 1, 2, and the integer (int, char, long, regardless of signedness) are passed in esi, edx, ecx, r8d, r9d and then the stack.

So I venture the guess that printf looks for the announced int in esi, and prints whatever happens to be there.

Whether the contents of esi are in any way predictable when nothing is moved there in main, and what they might signify, I have no idea.



回答2:

This answer attempts to address some of the sources of variation. It is a follow-up to Daniel Fischer’s answer and some comments to it.

As I do not work with Linux, I cannot give a definitive answer. For a printf later in a large application, there would be a myriad of sources of potential variation. This early in a small application, there should be only a few.

Address space layout randomization (ASLR) is one: The operating system deliberately rearranges some memory randomly to prevent malware for knowing what addresses to use. I do not know if Linux 3.4.4-2 has this.

Another is environment variables. Your shell environment variables are copied into processes it spawns (and accessible through the getenv routine). A few of those might change automatically, so they would have slightly different values. This is unlikely to directly affect what printf sees when it attempts to use a missing integer argument, but there could be cascading effects.

There may be a shared-library loader that runs either before main is called or before printf is called. For example, if printf is in a shared library, rather than built into your executable file, then a call to printf likely actually results in a call to a stub routine that calls the loader. The loader looks up the shared library, finds the module containing printf, loads that module into your process’ address space, changes the stub so that it calls the newly loaded printf directly in the future (instead of calling the loader), and calls printf. As you can imagine, that can be a fairly extensive process and involves, among other things, finding and reading files on disk (all the directories to get to the shared library and the shared library). It is conceivable that some caching or file operations on your system result in slightly different behavior in the loader.

So far, I favor ASLR as the most likely candidate of the ones above. The latter two are likely to be fairly stable; the values involved would usually change occasionally, not frequently. ASLR would change each time, and simply leaving an address in a register would suffice to explain the printf behavior.

Here is an experiment: After the initial printf, insert another printf with this code:

printf("%d\n", 2.443);
int a;
printf("%p\n", (void *) &a);

The second printf prints the address of a, which is likely on the stack. Run the program two or three times and calculate the difference between the value printed by the first printf and the value printed by the second printf. (The second printf is likely to print in hexadecimal, so it might be convenient to change the first to "%x" to make it hexadecimal too.) If the value printed by the second printf varies from run to run, then your program is experiencing ASLR. If the values change from run to run but the difference between them remains constant, then the value that printf has happened upon in the first printf is some address in your process that was left lying around after program initialization.

If the address of a changes but the difference does not remain constant, you might try changing int a; to static int a; to see if comparing the first value to different part of your address space yields a better result.

Naturally, none of this is useful for writing reliable programs; it is just educational with regard to how program loading and initialization works.