How to actually avoid floating point errors when y

2020-03-24 05:37发布

问题:

I am trying to affect the translation of a 3D model using some UI buttons to shift the position by 0.1 or -0.1.

My model position is a three dimensional float so simply adding 0.1f to one of the values causes obvious rounding errors. While I can use something like BigDecimal to retain precision, I still have to convert it from a float and back to a float at the end and it always results in silly numbers that are making my UI look like a mess.

I could just pretty the displayed values but the rounding errors will only get worse with more editing and they make my save files rather hard to read.

So how do I actually avoid these errors when I need to use a float?

回答1:

I would use a Rational class. There are many out there - this one looks like it should work.

One significant cost will be when the Rational is rendered into a float and one when the denominator is reduced to the gcd. The one I posted keeps the numerator and denominator in fully reduced state at all times which should be quite efficient if you are always adding or subtracting 1/10.

This implementation holds the values normalised (i.e. with consistent sign) but unreduced.

You should choose your implementation to best fit your usage.



回答2:

The Kahan summation and pairwise summation algorithms help to reduce floating point errors. Here's some Java code for the Kahan algorithm.



回答3:

A simple solution is to either use fixed precision. i.e. an integer 10x or 100x what you want.

float f = 10;
f += 0.1f;

becomes

int i = 100;
i += 1;  // use an many times as you like
// use i / 10.0 as required.

I wouldn't use float in any case as you get more rounding errors than double for next to no benefit (unless you have millions of float values) double gives you 8 more digits of precision and with sensible rounding would won't see those errors.



回答4:

If you stick with floats: The easiest way to avoid the error is using floats which are exact, but near the desired value which is

round(2^n * value) * 1/2^n.

n is the number of bits, value the number to use (in your case 0.1)

In your case with increasing precision:

n = 4 => 0.125
n = 8 (byte) => 0.9765625
n = 16 (short)=> 0.100006103516....

The long number chains are artefacts of the binary conversion, the real number has much less bits.

As the floats are exact, addition and subtraction will not introduce offset errors, but will always be predictable as long as the number of bits is not longer than the float value holds.

If you fear that your display will be compromised by using this solution (because they are odd floats), use and store only integers (step increase -1/1). The final value which is internally set is

x = value * step.

As the step increases or decreases by an amount of 1, precision will be retained.