Fast Arc Cos algorithm?

I have my own, very fast cos function:

float sine(float x)
{
    const float B = 4/pi;
    const float C = -4/(pi*pi);

    float y = B * x + C * x * abs(x);

    //  const float Q = 0.775;
    const float P = 0.225;

    y = P * (y * abs(y) - y) + y;   // Q * y + P * y * abs(y)


    return y;
}

float cosine(float x)
{
    return sine(x + (pi / 2));
}

But now when I profile, I see that acos() is killing the processor. I don't need intense precision. What is a fast way to calculate acos(x) Thanks.

标签： c++ c algorithm math performance

10条回答

相关推荐>>

2楼-- · 2020-02-02 08:36

nVidia has some great resources that show how to approximate otherwise very expensive math functions, such as: acos asin atan2 etc etc...

These algorithms produce good results when speed of execution is more important (within reason) than precision. Here's their acos function:

// Absolute error <= 6.7e-5
float acos(float x) {
  float negate = float(x < 0);
  x = abs(x);
  float ret = -0.0187293;
  ret = ret * x;
  ret = ret + 0.0742610;
  ret = ret * x;
  ret = ret - 0.2121144;
  ret = ret * x;
  ret = ret + 1.5707288;
  ret = ret * sqrt(1.0-x);
  ret = ret - 2 * negate * ret;
  return negate * 3.14159265358979 + ret;
}

And here are the results for when calculating acos(0.5):

nVidia:   result: 1.0471513828611643
math.h:   result: 1.0471975511965976

That's pretty close! Depending on your required degree of precision, this might be a good option for you.

0人赞添加讨论(0) 举报

手持菜刀，她持情操

3楼-- · 2020-02-02 08:37

Another approach you could take is to use complex numbers. From de Moivre's formula,

ⅈ^x = cos(π/2*x) + ⅈ*sin(π/2*x)

Let θ = π/2*x. Then x = 2θ/π, so

sin(θ) = ℑ(ⅈ^^2θ/π)
cos(θ) = ℜ(ⅈ^^2θ/π)

How can you calculate powers of ⅈ without sin and cos? Start with a precomputed table for powers of 2:

ⅈ⁴ = 1
ⅈ² = -1
ⅈ¹ = ⅈ
ⅈ^1/2 = 0.7071067811865476 + 0.7071067811865475*ⅈ
ⅈ^1/4 = 0.9238795325112867 + 0.3826834323650898*ⅈ
ⅈ^1/8 = 0.9807852804032304 + 0.19509032201612825*ⅈ
ⅈ^1/16 = 0.9951847266721969 + 0.0980171403295606*ⅈ
ⅈ^1/32 = 0.9987954562051724 + 0.049067674327418015*ⅈ
ⅈ^1/64 = 0.9996988186962042 + 0.024541228522912288*ⅈ
ⅈ^1/128 = 0.9999247018391445 + 0.012271538285719925*ⅈ
ⅈ^1/256 = 0.9999811752826011 + 0.006135884649154475*ⅈ

To calculate arbitrary values of ⅈ^x, approximate the exponent as a binary fraction, and then multiply together the corresponding values from the table.

For example, to find sin and cos of 72° = 0.8π/2:

ⅈ^0.8 ≈ ⅈ^205/256 = ⅈ^0b11001101 = ⅈ^1/2 * ⅈ^1/4 * ⅈ^1/32 * ⅈ^1/64 * ⅈ^1/256
= 0.3078496400415349 + 0.9514350209690084*ⅈ

sin(72°) ≈ 0.9514350209690084 ("exact" value is 0.9510565162951535)
cos(72°) ≈ 0.3078496400415349 ("exact" value is 0.30901699437494745).

To find asin and acos, you can use this table with the Bisection Method:

For example, to find asin(0.6) (the smallest angle in a 3-4-5 triangle):

ⅈ⁰ = 1 + 0*ⅈ. The sin is too small, so increase x by 1/2.
ⅈ^1/2 = 0.7071067811865476 + 0.7071067811865475*ⅈ . The sin is too big, so decrease x by 1/4.
ⅈ^1/4 = 0.9238795325112867 + 0.3826834323650898*ⅈ. The sin is too small, so increase x by 1/8.
ⅈ^3/8 = 0.8314696123025452 + 0.5555702330196022*ⅈ. The sin is still too small, so increase x by 1/16.
ⅈ^7/16 = 0.773010453362737 + 0.6343932841636455*ⅈ. The sin is too big, so decrease x by 1/32.
ⅈ^13/32 = 0.8032075314806449 + 0.5956993044924334*ⅈ.

Each time you increase x, multiply by the corresponding power of ⅈ. Each time you decrease x, divide by the corresponding power of ⅈ.

If we stop here, we obtain acos(0.6) ≈ 13/32*π/2 = 0.6381360077604268 (The "exact" value is 0.6435011087932844.)

The accuracy, of course, depends on the number of iterations. For a quick-and-dirty approximation, use 10 iterations. For "intense precision", use 50-60 iterations.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2020-02-02 08:41

You can approximate the inverse cosine with a polynomial as suggested by dan04, but a polynomial is a pretty bad approximation near -1 and 1 where the derivative of the inverse cosine goes to infinity. When you increase the degree of the polynomial you hit diminishing returns quickly, and it is still hard to get a good approximation around the endpoints. A rational function (the quotient of two polynomials) can give a much better approximation in this case.

acos(x) ≈ π/2 + (ax + bx³) / (1 + cx² + dx⁴)

where

a = -0.939115566365855
b =  0.9217841528914573
c = -1.2845906244690837
d =  0.295624144969963174

has a maximum absolute error of 0.017 radians (0.96 degrees) on the interval (-1, 1). Here is a plot (the inverse cosine in black, cubic polynomial approximation in red, the above function in blue) for comparison:

The coefficients above have been chosen to minimise the maximum absolute error over the entire domain. If you are willing to allow a larger error at the endpoints, the error on the interval (-0.98, 0.98) can be made much smaller. A numerator of degree 5 and a denominator of degree 2 is about as fast as the above function, but slightly less accurate. At the expense of performance you can increase accuracy by using higher degree polynomials.

A note about performance: computing the two polynomials is still very cheap, and you can use fused multiply-add instructions. The division is not so bad, because you can use the hardware reciprocal approximation and a multiply. The error in the reciprocal approximation is negligible in comparison with the error in the acos approximation. On a 2.6 GHz Skylake i7, this approximation can do about 8 inverse cosines every 6 cycles using AVX. (That is throughput, the latency is longer than 6 cycles.)

0人赞添加讨论(0) 举报

放我归山

5楼-- · 2020-02-02 08:48

Here is a great website with many options: https://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/arcsin/onlyelem.html

Personally I went the Chebyshev-Pade quotient approximation with with the following code:

double arccos(double x) {
const double pi = 3.141592653;
    return pi / 2 - (.5689111419 - .2644381021*x - .4212611542*(2*x - 1)*(2*x - 1)
         + .1475622352*(2*x - 1)*(2*x - 1)*(2*x - 1))
         / (2.006022274 - 2.343685222*x + .3316406750*(2*x - 1)*(2*x - 1) +
             .02607135626*(2*x - 1)*(2*x - 1)*(2*x - 1));
}

0人赞添加讨论(0) 举报

爷、活的狠高调

6楼-- · 2020-02-02 08:50

Unfortunately I do not have enough reputation to comment. Here is a small modification of Nvidia's function, that deals with the fact that numbers that should be <= 1 while preserving performance as much as possible.

It may be important since rounding errors can lead number that should be 1.0 to be (oh so slightly) larger than 1.0.


double safer_acos(double x) {
  double negate = double(x < 0);
  x = abs(x);
  x -= double(x>1.0)*(x-1.0); // <- equivalent to min(1.0,x), but faster
  double ret = -0.0187293;
  ret = ret * x;
  ret = ret + 0.0742610;
  ret = ret * x;
  ret = ret - 0.2121144;
  ret = ret * x;
  ret = ret + 1.5707288;
  ret = ret * sqrt(1.0-x);
  ret = ret - 2 * negate * ret;
  return negate * 3.14159265358979 + ret;

  // In a single line (no gain using gcc)
  //return negate * 3.14159265358979 + (((((-0.0187293*x)+ 0.0742610)*x - 0.2121144)*x + 1.5707288)* sqrt(1.0-x))*(1.0-2.0*negate);

}

0人赞添加讨论(0) 举报

Fickle 薄情

7楼-- · 2020-02-02 08:53

Got spare memory? A lookup table (with interpolation, if required) is gonna be fastest.

0人赞添加讨论(0) 举报

1 2 下一页

Fast Arc Cos algorithm?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间