This question already has an answer here:
I would like to introduce some artificial precision loss into two numbers being compared to smooth out minor rounding errors so that I don't have to use the Math.abs(x - y) < eps
idiom in every comparison involving x
and y
.
Essentially, I want something that behaves similarly to down-casting a double
to a float
and then up-casting it back to a double
, except I want to also preserve very large and very small exponents and I want some control over the number of significand bits preserved.
Given the following function that produces the binary representation of the significand of a 64-bit IEEE 754 number:
public static String significand(double d) {
int SIGN_WIDTH = 1;
int EXP_WIDTH = 11;
int SIGNIFICAND_WIDTH = 53;
String s = String.format("%64s", Long.toBinaryString(Double.doubleToRawLongBits(d))).replace(' ', '0');
return s.substring(0 + SIGN_WIDTH, 0 + SIGN_WIDTH + EXP_WIDTH);
}
I want a function reducePrecision(double x, int bits)
that reduces the precision of the significand of a double
such that:
significand(reducePrecision(x, bits)).substring(bits).equals(String.format("%0" + (52 - bits) + "d", 0))
In other words, every bit after the bits
-most significant bit in the significand of reducePrecision(x, bits)
should be 0, while the bits
-most significant bits in the significand of reducePrecision(x, bits)
should reasonably approximate the bits
-most signicant bits in the significand of x
.
Suppose
x
is the number you wish to reduce the precision of andbits
is the number of significant bits you wish to retain.When
bits
is sufficiently large and the order of magnitude ofx
is sufficiently close to 0, thenx * (1L << (bits - Math.getExponent(x)))
will scalex
so that the bits that need to be removed will appear in the fractional component (after the radix point) while the bits that will be retained will appear in the integer component (before the radix point). You can then round this to remove the fractional component and then divide the rounded number by(1L << (bits - Math.getExponent(x)))
to restore the order of magnitude ofx
, i.e.:However,
(1L << exponent)
will break down whenMath.getExponent(x) > bits || Math.getExponent(x) < bits - 62
. The solution is to useMath.pow(2, exponent)
(or the fastpow2(exponent)
implementation from this answer) to calculate a fractional, or a very large, power of 2, i.e.:However,
Math.pow(2, exponent)
will break down asexponent
approaches -1074 or +1023. The solution is to useMath.scalb(x, exponent)
so that the power of 2 doesn't have to be explicitly calculated, i.e.:However,
Math.round(y)
returns along
so it does not preserveInfinity
,NaN
, and cases whereMath.abs(x) > Long.MAX_VALUE / Math.pow(2, exponent)
. Furthermore,Math.round(y)
always rounds ties to positive infinity (e.g.Math.round(0.5) == 1 && Math.round(1.5) == 2
). The solution is to useMath.rint(y)
to receive adouble
and preserve the unbiased IEEE 754 round-to-nearest, ties-to-even rule (e.g.Math.rint(0.5) == 0.0 && Math.rint(1.5) == 2.0
), i.e.:Finally, here is a unit test confirming our expectations:
And its output: