I'm struggling to understand how TO IMPLEMENT the range reduction algorithm published by Payne and Hanek (range reduction for trigonometric functions)
I've seen there's this library: http://www.netlib.org/fdlibm/
But it looks to me so twisted, and all the theoretical explanation i've founded are too simple to provide an implementation.
Is there some good... good... good explanation of it?
As someone who once tried to implement it, I feel your pain (no pun intended).
Before attempting it, make sure you have a good understanding of when floating point operations are exact, and know how double-double arithmetic works.
Other than that, my best advice is to look at what other smart people have done:
ReduceFull
Performing argument reduction for trigonometric functions via the Payne-Hanek algorithm is actually pretty straightforward. As with other argument reduction schemes, compute
n = round_nearest (x / (π/2))
, then compute remainder viax - n * π/2
. Better efficiency is achieved by computingn = round_nearest (x * (2/π))
.The key observation in Payne-Hanek is that when computing the remainder of
x - n * π/2
using the full unrounded product the leading bits cancel during subtraction, so we do not need to compute those. We are left with the problem of finding the right starting point (non-zero bits) based on the magnitude ofx
. Ifx
is close to a multiple ofπ/2
, there may be additional cancellation, which is limited. One can consult the literature for an upper bound on the number of additional bits that cancel in such cases. Due to relatively high computational cost, Payne-Hanek is usually only used for arguments large in magnitude, which has the additional benefit that during subtraction the bits of the original argumentx
are zero in the relevant bit positions.Below I show exhaustively tested C99 code for single-precision
sinf()
that I wrote recently that incorporates Payne-Hanek reduction in the slow path of the reduction, seetrig_red_slowpath_f()
. Note that in order to achieve a faithfully roundedsinf()
one would have to augment the argument reduction to return the reduced argument as twofloat
operands in head/tail fashion.Various design choices are possible, below I opted for largely integer-based computation in order to minimize the storage for the required bits of
2/π
. Implementations using floating-point computation and overlapped pairs or triples of floating-point numbers to store the bits of2/π
are also common.