I was wondering if exp()
is faster than more general pow()
. I run fast benchmark on JsPerf http://jsperf.com/pow-vs-exp and it shown interesting results for me.
Math.exp(logBase * exponent); // fastest
Math.exp(Math.log(base) * exponent); // middle
Math.pow(base, exponent); // slowest
I know that results will heavily vary on architecture and language but I am also interested in theoretical point of view. Is pow(a, b)
implemented as exp(log(a) * b)
or is there some more clever way how co compute power "directly" (in C++, C# or JavaScript). Are there CPU instructions for exp, log or pow on some architectures?
As far as I know, both exp()
and log()
are computed using some Taylor series and are pretty expensive to compute. This makes me believe that for constant base of power, this code
double logBase = log(123.456);
for (int i = 0; i < 1024; ++i) {
exp(logBase * 654.321);
}
is better than this
for (int i = 0; i < 1024; ++i) {
pow(123.456, 654.321);
}
Is that correct assumption?
Yes,
exp
will be faster thanpow
in general.The
exp
andlog
functions will be optimized for the target platform; many techniques can be used such as Pade approximation, linear or binary reduction followed by approximation, etc.The
pow
function will generally be implemented asexp(log(a) * b)
as you say, so it is obviously slower thanexp
alone. There are many special cases forpow
such as negative exponents, integral exponents, exponents equal to 1/2 or 1/3, etc. These will slow downpow
even further in the general case because these tests are expensive.See this SO question on
pow
.Regardless of the architecture details,
Math.pow
has to do more in terms of error checking (for example, what happens if the base is negative?). thanMath.exp
(and as such I'd expectpow
to be slower).Relevant parts of the spec:
http://ecma-international.org/ecma-262/5.1/#sec-15.8.2.8
http://ecma-international.org/ecma-262/5.1/#sec-15.8.2.13
As a partial answer, there are instructions for exp, log or pow on some architectures yes. However, that doesn't necessarily mean much.
For example, on x86 there's
f2xm1
which calculates 2x - 1fscale
which calculates y * 2(int)xfyl2x
which calculates y * log2 xfyl2xp1
which calculates y * log2(x + 1) (has restrictions on input range)However, they are not much used. It varies from architecture to architecture, but they're never fast. As a more extreme example,
fyl2x
has a latency of 724 on Sandy Bridge (pretty recent!), in that time on the same processor you could do about 700 independent floating point additions, or about 240 dependent floating point additions, or about 2000 independent simple integer operations.That's about as bad as it gets, but they're typically slow. Slow enough that a manual implementation can beat them or at least not significantly lose.
Also, FPU code is slowly disappearing in favour of SSE code. There are no SSE equivalents of those instructions.