可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to understand how calculations involving numbers greater than 232 happen on a 32 bit machine.
C code
$ cat size.c
#include<stdio.h>
#include<math.h>
int main() {
printf ("max unsigned long long = %llu\n",
(unsigned long long)(pow(2, 64) - 1));
}
$
gcc output
$ gcc size.c -o size
$ ./size
max unsigned long long = 18446744073709551615
$
Corresponding assembly code
$ gcc -S size.c -O3
$ cat size.s
.file "size.c"
.section .rodata.str1.4,"aMS",@progbits,1
.align 4
.LC0:
.string "max unsigned long long = %llu\n"
.text
.p2align 4,,15
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $-1, 8(%esp) #1
movl $-1, 12(%esp) #2
movl $.LC0, 4(%esp) #3
movl $1, (%esp) #4
call __printf_chk
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits
$
What exactly happens on the lines 1 - 4?
Is this some kind of string concatenation at the assembly level?
回答1:
__printf_chk
is a wrapper around printf
which checks for stack overflow, and takes an additional first parameter, a flag (e.g. see here.)
pow(2, 64) - 1
has been optimised to 0xffffffffffffffff
as the arguments are constants.
As per the usual calling conventions, the first argument to __printf_chk()
(int flag
) is a 32-bit value on the stack (at %esp
at the time of the call
instruction). The next argument, const char * format
, is a 32-bit pointer (the next 32-bit word on the stack, i.e. at %esp+4
). And the 64-bit quantity that is being printed occupies the next two 32-bit words (at %esp+8
and %esp+12
):
pushl %ebp ; prologue
movl %esp, %ebp ; prologue
andl $-16, %esp ; align stack pointer
subl $16, %esp ; reserve bytes for stack frame
movl $-1, 8(%esp) #1 ; store low half of 64-bit argument (a constant) to stack
movl $-1, 12(%esp) #2 ; store high half of 64-bit argument (a constant) to stack
movl $.LC0, 4(%esp) #3 ; store address of format string to stack
movl $1, (%esp) #4 ; store "flag" argument to __printf_chk to stack
call __printf_chk ; call routine
leave ; epilogue
ret ; epilogue
The compiler has effectively rewritten this:
printf("max unsigned long long = %llu\n", (unsigned long long)(pow(2, 64) - 1));
...into this:
__printf_chk(1, "max unsigned long long = %llu\n", 0xffffffffffffffffULL);
...and, at runtime, the stack layout for the call looks like this (showing the stack as 32-bit words, with addresses increasing from the bottom of the diagram upwards):
: :
: Stack :
: :
+-----------------+
%esp+12 | 0xffffffff | \
+-----------------+ } <-------------------------------------.
%esp+8 | 0xffffffff | / |
+-----------------+ |
%esp+4 |address of string| <---------------. |
+-----------------+ | |
%esp | 1 | <--. | |
+-----------------+ | | |
__printf_chk(1, "max unsigned long long = %llu\n", |
0xffffffffffffffffULL);
回答2:
similar to the way as we handle numbers greater than 9, with only digits 0 - 9.
(using positional digits). presuming the question is a conceptual one.
回答3:
In your case, the compiler knows that 2^64-1 is just 0xffffffffffffffff, so it has pushed -1 (low dword) and -1 (high dword) onto the stack as your argument to printf. It's just an optimization.
In general, 64-bit numbers (and even greater values) can be stored with multiple words, e.g. an unsigned long long
uses two dword
s. To add two 64-bit numbers, two additions are performed - one on the low 32 bits, and one on the high 32 bits, plus the carry:
; Add 64-bit number from esi onto edi:
mov eax, [esi] ; get low 32 bits of source
add [edi], eax ; add to low 32 bits of destination
; That add may have overflowed, and if it did, carry flag = 1.
mov eax, [esi+4] ; get high 32 bits of source
adc [edi+4], eax ; add to high 32 bits of destination, then add carry.
You can repeat this sequence of add
and adc
s as much as you like to add arbitrarily big numbers. The same thing can be done with subtraction - just use sub
and sbb
(subtract with borrow).
Multiplication and division are much more complicated, and the compiler usually produces some small helper functions to deal with these whenever you multiply 64-bit numbers together. Packages like GMP which support very, very large integers use SSE/SSE2 to speed things up. Take a look at this Wikipedia article for more information on multiplication algorithms.
回答4:
As others have pointed out all 64-bit aritmetic in your example has been optimised away. This answer focuses on the question int the title.
Basically we treat each 32-bit number as a digit and work in base 4294967296. In this manner we can work on arbiterally big numbers.
Addition and subtraction are easiest. We work through the digits one at a time starting from the least significant and moving to the most significant. Generally the first digit is done with a normal add/subtract instruction and later digits are done using a specific "add with carry" or "subtract with borrow" instruction. The carry flag in the status register is used to take the carry/borrow bit from one digit to the next. Thanks to twos complement signed and unsigned addition and subtraction are the same.
Multiplication is a little trickier, multiplying two 32-bit digits can produce a 64-bit result. Most 32-bit processors will have instructions that multiply two 32-bit numbers and produces a 64-bit result in two registers. Addition will then be needed to combine the results into a final answer. Thanks to twos complement signed and unsigned multiplication are the same provided the desired result size is the same as the argument size. If the result is larger than the arguments then special care is needed.
For comparision we start from the most significant digit. If it's equal we move down to the next digit until the results are equal.
Division is too complex for me to describe in this post, but there are plenty of examples out there of algorithms. e.g. http://www.hackersdelight.org/hdcodetxt/divDouble.c.txt
Some real-world examples from gcc https://godbolt.org/g/NclqXC , the assembler is in intel syntax.
First an addition. adding two 64-bit numbers and producing a 64-bit result. The asm is the same for both signed and unsigned versions.
int64_t add64(int64_t a, int64_t b) { return a + b; }
add64:
mov eax, DWORD PTR [esp+12]
mov edx, DWORD PTR [esp+16]
add eax, DWORD PTR [esp+4]
adc edx, DWORD PTR [esp+8]
ret
This is pretty simple, load one argument into eax and edx, then add the other using an add followed by an add with carry. The result is left in eax and edx for return to the caller.
Now a multiplication of two 64-bit numbers to produce a 64-bit result. Again the code doesn't change from signed to unsigned. I've added some comments to make it easier to follow.
Before we look at the code lets consider the math. a and b are 64-bit numbers I will use lo() to represent the lower 32-bits of a 64-bit number and hi() to represent the upper 32 bits of a 64-bit number.
(a * b) = (lo(a) * lo(b)) + (hi(a) * lo(b) * 2^32) + (hi(b) * lo(a) * 2^32) + (hi(b) * hi(a) * 2^64)
(a * b) mod 2^64 = (lo(a) * lo(b)) + (lo(hi(a) * lo(b)) * 2^32) + (lo(hi(b) * lo(a)) * 2^32)
lo((a * b) mod 2^64) = lo(lo(a) * lo(b))
hi((a * b) mod 2^64) = hi(lo(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
uint64_t mul64(uint64_t a, uint64_t b) { return a*b; }
mul64:
push ebx ;save ebx
mov eax, DWORD PTR [esp+8] ;load lo(a) into eax
mov ebx, DWORD PTR [esp+16] ;load lo(b) into ebx
mov ecx, DWORD PTR [esp+12] ;load hi(a) into ecx
mov edx, DWORD PTR [esp+20] ;load hi(b) into edx
imul ecx, ebx ;ecx = lo(hi(a) * lo(b))
imul edx, eax ;edx = lo(hi(b) * lo(a))
add ecx, edx ;ecx = lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
mul ebx ;eax = lo(low(a) * lo(b))
;edx = hi(low(a) * lo(b))
pop ebx ;restore ebx.
add edx, ecx ;edx = hi(low(a) * lo(b)) + lo(hi(a) * lo(b)) + lo(hi(b) * lo(a))
ret
Finally when we try a division we see.
int64_t div64(int64_t a, int64_t b) { return a/b; }
div64:
sub esp, 12
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
push DWORD PTR [esp+28]
call __divdi3
add esp, 28
ret
The compiler has decided that division is too complex to implement inline and instead calls a library routine.
回答5:
The compiler actually made a static optimization of your code.
lines #1 #2 #3 are parameters for printf()
回答6:
As @Pafy mentions, the compiler has evaluated this as a constant.
2 to the 64th minus 1 is 0xffffffffffffffff
.
As 2 32-bit integers this is: 0xffffffff
and 0xffffffff
,
which if you take that as a pair of 32-bit signed types, ends up as: -1
, and -1
.
Thus for your compiler the code generated happens to be equivalent to:
printf("max unsigned long long = %llu\n", -1, -1);
In the assembly it's written like this:
movl $-1, 8(%esp) #Second -1 parameter
movl $-1, 12(%esp) #First -1 parameter
movl $.LC0, 4(%esp) #Format string
movl $1, (%esp) #A one. Kind of odd, perhaps __printf_chk
#in your C library expects this.
call __printf_chk
By the way, a better way to calculate powers of 2 is to shift 1
left. Eg. (1ULL << 64) - 1
.
回答7:
No one in this thread noticed that the OP asked to explain the first 4 lines, not lines 11-14.
The first 4 lines are:
.file "size.c"
.section .rodata.str1.4,"aMS",@progbits,1
.align 4
.LC0:
Here's what happens in first 4 lines:
.file "size.c"
This is an assembler directive that says that we are about to start a new logical file called "size.c".
.section .rodata.str1.4,"aMS",@progbits,1
This is also a directive for read only strings in the program.
.align 4
This directive sets the location counter to always be a multiple of 4.
.LC0:
This is a label LC0
that can be jumped to, for example.
I hope I provided the right answer to the question as I answered exactly what OP asked.