The probability density function of the standard normal distribution is defined as e-x2/2 / √(2π). This can be rendered in straightforward manner into C code. A sample single-precision implementation might be:
float my_normpdff (float a)
{
return 0x1.988454p-2f * my_expf (-0.5f * a * a); /* 1/sqrt(2*pi) */
}
While this code is free from premature underflow, there is an accuracy issue since the error incurred in the computation of a2/2 is magnified by the subsequent exponentiation. One can easily demonstrate this with tests against higher-precision references. The exact error will differ based on the accuracy of the exp()
or expf()
implementations used; for faithfully rounded exponentiation functions one would typically observe a maximum error of around 26 ulps for IEEE-754 binary32
single precision, around 29 ulps for IEEE-754 binary64
double precision.
How can the accuracy issue be addressed in a reasonably efficient manner? A trivial approach would be to employ higher-precision intermediate computation, for example use double
computation for the float
implementation. But this approach does not work for a double
implementation if floating-point arithmetic of higher precision is not easily available, and may be inefficient for float
implementation if double
arithmetic is significantly more expensive than float
computation, e.g. on many GPUs.
The accuracy issue raised in the question can effectively, and efficiently, be addressed by the use of limited amounts of double-
float
or double-double
computation, facilitated by the use of the fused multiply-add (FMA) operation.This operation is available since
C99
via the standard math functionsfmaf(a,b,c)
andfma(a,b,c)
which compute a*b+c, without rounding of the intermediate product. While the functions map directly to fast hardware operations on almost all modern processors, they may use emulation code on older platforms, in which case they may be be very slow.This allows the computation of the product with twice the normal precision using just two operations, resulting in a head:tail pair of native-precision numbers:
The high-order bits of the result can be passed to the exponentiation, while the low-order bits are used for improving the accuracy of the result via linear interpolation, taking advantage of the fact that ex is its own derivative:
This allows us to eliminate most of the error of the naive implementation. The other, minor, source of computation error is the limited precision with which the constant 1/√(2π) is represented. This can be improved by using a head:tail representation for the constant that provides twice the native precision, and computing:
The following paper points out that this technique can even result in correctly-rounded multiplication for some arbitrary-precision constants:
Nicolas Brisebarre and Jean-Michel Muller, "Correctly rounded multiplication by arbitrary precision constants", IEEE Transactions on Computers, Vol. 57, No. 2, February 2008, pp. 165-174
Combining the two techniques, and taking care of a few corner cases involving NaNs, we arrive at the following
float
implementation based on IEEE-754binary32
:The corresponding
double
implementation, based on IEEE-754binary64
, looks almost identical, except for the different constant values used:The accuracy of these implementations depends on the accuracy of the standard math functions
expf()
andexp()
, respectively. Where the C math library provides faithfully-rounded versions of those, the maximum error of either of the two implementations above is typically less than 2.5 ulps.