可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to know if there is any difference in speed between the following two methods
#include<math.h>
#include<iostream>
using namespace std;
int main()
{
float k=7;
float b=4;
cout<<(float)k/b<<" "<<endl;
cout<<(float)(k*powf(b,-1))<<" "<<endl;
return 0;
}
- trivial diivision
k/b
- using multiplication
k*b^(-1)
,
I think that in second method, there is not actual division procedure. So I think the second is faster. Am I right?
回答1:
Calculating b^(-1) isn't trivial, and will sometimes (though not always) be converted to 1/b by an optimizer. Just as much, an optimizer might switch *b^(-1)
with /b
. This is probably hardware-dependent (and possibly even number-dependent, if numbers are known in compilation time).
If you want to test it, put each one in a loop that runs many times (don't print anything to the screen), and time each execution.
Also note that in some cases, these 2 methods will give you slightly different results, if you're interested in precise computation.
回答2:
Chances are pretty good that you're wrong. Even if the compiler can generate powf
as an intrinsic (i.e., generate inline code for it), N-1 is equivalent to 1/N, so you're just doing the division in a more roundabout fashion. If raising to a power was a faster way to do division, the compiler would probably already generate code to do it that way.
On the other hand, if you're doing a lot of division using the same denominator, you might gain something by turning a loop like:
for (int i=0; i<some_size; i++)
somearray[i] /= denom;
into:
double scale = 1.0/denom;
for (int i=0; i<some_size; i++)
somearray[i] *= scale;
Of course, the compiler might be smart enough to do that as well, but at least with something like this, what you're doing stands a decent chance of being an improvement -- the only question is whether the compiler is already doing it for it for you.
回答3:
You should profile both versions and check which one's faster, but I'm almost prepared to eat my hat if it's the second, for several reasons:
powf
is a general function that is very slow for typical values of the exponent parameter. Even if it's optimized to handle the case -1
specially, there will still be if
s, etc, checking for this special case, which will add overhead to the computation.
- Depending on the implementation,
powf
may not be inlined, and you will incur the cost of the function call.
- If
k*(1.0/b)
were really faster than k/b
, then the compiler would optimize the latter into the former already (especially when specifying compiler flags to allow fast, but not necessarily perfectly IEEE-compliant, floating-point math.)
The bottom line is that modern compilers are extremely good at performing microoptimizations; I wouldn't dream of tryings "tricks" like this one unless profiling and disassembly showed clear evidence the compiler was doing something inefficient.
回答4:
Speed always depends on how well your compiler optimizes. So, a reliable answer can only be found by measuring.
However, the second method implies more calls than the first. Therefore, I guess the first will be faster.
回答5:
As others have stated, profile.
In computer science, division may be implemented by repetitive subtraction. Which may be faster than multiplication by the inverse.
I believe that division would be faster than multiplying by the inverse since the inverse has to be calculated first, then multiplication performed. So while division is performing subtractions, the other method is determining the inverse. After the inverse is calculated, multiple shifts and adds (multiplication) must be performed.
See Wikipedia article on division algorithm
回答6:
Using gcc -O2 on my x86-64 machine, k / b translates to one instruction:
divss xmm0, xmm1
The 'powf' way results in the following:
movss xmm2, DWORD PTR .LC0[rip]
divss xmm2, xmm1
mulss xmm2, xmm0
movaps xmm0, xmm2
where .LC0 is 1065353216, or 1.0f.
So it would seem to me that, at least for the x86-64 machines that are common nowadays, k / b is faster.