Pow() vs. exp() performance

2020-07-04 07:04发布

I was wondering if exp() is faster than more general pow(). I run fast benchmark on JsPerf http://jsperf.com/pow-vs-exp and it shown interesting results for me.

Math.exp(logBase * exponent);  // fastest
Math.exp(Math.log(base) * exponent);  // middle
Math.pow(base, exponent);  // slowest

I know that results will heavily vary on architecture and language but I am also interested in theoretical point of view. Is pow(a, b) implemented as exp(log(a) * b) or is there some more clever way how co compute power "directly" (in C++, C# or JavaScript). Are there CPU instructions for exp, log or pow on some architectures?

As far as I know, both exp() and log() are computed using some Taylor series and are pretty expensive to compute. This makes me believe that for constant base of power, this code

double logBase = log(123.456);
for (int i = 0; i < 1024; ++i) {
    exp(logBase * 654.321);
}

is better than this

for (int i = 0; i < 1024; ++i) {
    pow(123.456, 654.321);
}

Is that correct assumption?

3条回答
甜甜的少女心
2楼-- · 2020-07-04 07:41

Yes, exp will be faster than pow in general.

The exp and log functions will be optimized for the target platform; many techniques can be used such as Pade approximation, linear or binary reduction followed by approximation, etc.

The pow function will generally be implemented as exp(log(a) * b) as you say, so it is obviously slower than exp alone. There are many special cases for pow such as negative exponents, integral exponents, exponents equal to 1/2 or 1/3, etc. These will slow down pow even further in the general case because these tests are expensive.

See this SO question on pow.

查看更多
聊天终结者
3楼-- · 2020-07-04 07:41

Regardless of the architecture details, Math.pow has to do more in terms of error checking (for example, what happens if the base is negative?). than Math.exp (and as such I'd expect pow to be slower).

Relevant parts of the spec:

http://ecma-international.org/ecma-262/5.1/#sec-15.8.2.8

15.8.2.8 exp (x)

Returns an implementation-dependent approximation to the exponential function of x (e raised to the power of x, where e is the base of the natural logarithms).

If x is NaN, the result is NaN. If x is +0, the result is 1. If x is −0, the result is 1. If x is +∞, the result is +∞. If x is −∞, the result is +0.

http://ecma-international.org/ecma-262/5.1/#sec-15.8.2.13

15.8.2.13 pow (x, y)

Returns an implementation-dependent approximation to the result of raising x to the power y.

If y is NaN, the result is NaN. If y is +0, the result is 1, even if x is NaN. If y is −0, the result is 1, even if x is NaN. If x is NaN and y is nonzero, the result is NaN. If abs(x)>1 and y is +∞, the result is +∞. If abs(x)>1 and y is −∞, the result is +0. If abs(x)==1 and y is +∞, the result is NaN. If abs(x)==1 and y is −∞, the result is NaN. If abs(x)<1 and y is +∞, the result is +0. If abs(x)<1 and y is −∞, the result is +∞. If x is +∞ and y>0, the result is +∞. If x is +∞ and y<0, the result is +0. If x is −∞ and y>0 and y is an odd integer, the result is −∞. If x is −∞ and y>0 and y is not an odd integer, the result is +∞. If x is −∞ and y<0 and y is an odd integer, the result is −0. If x is −∞ and y<0 and y is not an odd integer, the result is +0. If x is +0 and y>0, the result is +0. If x is +0 and y<0, the result is +∞. If x is −0 and y>0 and y is an odd integer, the result is −0. If x is −0 and y>0 and y is not an odd integer, the result is +0. If x is −0 and y<0 and y is an odd integer, the result is −∞. If x is −0 and y<0 and y is not an odd integer, the result is +∞. If x<0 and x is finite and y is finite and y is not an integer, the result is NaN.

查看更多
狗以群分
4楼-- · 2020-07-04 07:54

As a partial answer, there are instructions for exp, log or pow on some architectures yes. However, that doesn't necessarily mean much.

For example, on x86 there's

  • f2xm1 which calculates 2x - 1
  • fscale which calculates y * 2(int)x
  • fyl2x which calculates y * log2 x
  • fyl2xp1 which calculates y * log2(x + 1) (has restrictions on input range)

However, they are not much used. It varies from architecture to architecture, but they're never fast. As a more extreme example, fyl2x has a latency of 724 on Sandy Bridge (pretty recent!), in that time on the same processor you could do about 700 independent floating point additions, or about 240 dependent floating point additions, or about 2000 independent simple integer operations.

That's about as bad as it gets, but they're typically slow. Slow enough that a manual implementation can beat them or at least not significantly lose.

Also, FPU code is slowly disappearing in favour of SSE code. There are no SSE equivalents of those instructions.

查看更多
登录 后发表回答