[I globally edited the question to be more "useful" and clear]
I was wondering about the complexity of the implementation of the function exp
in cmath.
By complexity, I mean algorithmic complexity if possible. Otherwise cost compared to a floating point operation (addition for example)
The following lines :
double x = 3;
double y = std::exp(x);
compile to :
...
19,23d16
movq %rax, -40(%rbp)
movsd -40(%rbp), %xmm0
call exp
movsd %xmm0, -40(%rbp)
movq -40(%rbp), %rax
...
exp
has to be dynamically loaded at runtime but I can't find many information on the implementation algorithmic complexity. It seems there's no call to a special processor instruction (at least on my x86_64 platform with gcc) so there must be an implementation somewhere that I can't find.
On my mind, the algorithm is likely to use the binary representation of the input to have a very weak complexity but I haven't been able to find a valuable reference on this topic.
Maybe speaking of algorithmical complexity is not really possible in this case and all what we can do is testing (cf. answers below) but I don't know how we can quantify objectively the difference between a floating point operation and a call to exp ?
It seems that the complexity is actually constant since MSVC9 compiler does some bit-magic involving specific tables, bitmasks and biases. As there are few branches after all instruction pipeline should help a lot. Below is what it does actually.
unpcklpd xmm0,xmm0
movapd xmm1,xmmword ptr [cv]
movapd xmm6,xmmword ptr [Shifter]
movapd xmm2,xmmword ptr [cv+10h]
movapd xmm3,xmmword ptr [cv+20h]
pextrw eax,xmm0,3
and eax,7FFFh
mov edx,408Fh
sub edx,eax
sub eax,3C90h
or edx,eax
cmp edx,80000000h
jae RETURN_ONE
mulpd xmm1,xmm0
addpd xmm1,xmm6
movapd xmm7,xmm1
subpd xmm1,xmm6
mulpd xmm2,xmm1
movapd xmm4,xmmword ptr [cv+30h]
mulpd xmm3,xmm1
movapd xmm5,xmmword ptr [cv+40h]
subpd xmm0,xmm2
movd eax,xmm7
mov ecx,eax
and ecx,3Fh
shl ecx,4
sar eax,6
mov edx,eax
subpd xmm0,xmm3
movapd xmm2,xmmword ptr Tbl_addr[ecx]
mulpd xmm4,xmm0
movapd xmm1,xmm0
mulpd xmm0,xmm0
addpd xmm5,xmm4
mulsd xmm0,xmm0
addsd xmm1,xmm2
unpckhpd xmm2,xmm2
movdqa xmm6,xmmword ptr [mmask]
pand xmm7,xmm6
movdqa xmm6,xmmword ptr [bias]
paddq xmm7,xmm6
psllq xmm7,2Eh
mulpd xmm0,xmm5
addsd xmm1,xmm0
orpd xmm2,xmm7
unpckhpd xmm0,xmm0
addsd xmm0,xmm1
add edx,37Eh
cmp edx,77Ch
ja ADJUST
mulsd xmm0,xmm2
sub esp,10h
addsd xmm0,xmm2
movlpd qword ptr [esp+4],xmm0
fld qword ptr [esp+4]
add esp,10h
ret
Speaking generally, the complexity for primitive types should be very fast. As the commenters mention, there are sometimes instructions for it, and if not there are well-known fast algorithms, Knuth has a good section on such things.
The usual implementation for exponentiation is square-and-multiply which makes use of the observation that you can break any exponentiation up into some number of squarings plus at most one more multiply. The basic algorithm for n**k
is given here and is O( lg k).
Here one can find a fast exp
implementation that uses the SSE
instructions.
Are you interested in the time taken by exponentiation, as compared to time taken by other floating-point operations? That will vary from implementation to implementation, and also from computer to computer (one may have a different math processor), so there's no one answer we can give.
The right approach, if you want to know, is to write test functions and time them. Loop through a million floating-point assignments and time it, then loop through a million floating-point assignments of exponentials and time that, then subtract. Watch out for the optimizer on this one, as if you don't use the result of the assignments it's allowed to remove the whole loop. You'll know that by extremely fast runtimes that don't vary with the size of the loop.