Double in place of Float and Float rounding

2019-09-07 17:09发布

问题:

Edit: This question covers two topics:

  • The efficiency of using in double in place of float
  • Float precision following rounding

Is there any reason why I should not always use Java double instead of float?

I ask this question because this test code when using floats is failing and not clear why since the only difference is the use of float instead of double.

public class BigDecimalTest {
@Test public void testDeltaUsingDouble() { //test passes
    BigDecimal left = new BigDecimal("0.99").setScale(2,BigDecimal.ROUND_DOWN);
    BigDecimal right = new BigDecimal("0.979").setScale(2,BigDecimal.ROUND_DOWN);

    Assert.assertEquals(left.doubleValue(), right.doubleValue(), 0.09);
    Assert.assertEquals(left.doubleValue(), right.doubleValue(), 0.03);

    Assert.assertNotEquals(left.doubleValue(), right.doubleValue(), 0.02);
    Assert.assertNotEquals(left.doubleValue(), right.doubleValue(), 0.01);
    Assert.assertNotEquals(left.doubleValue(), right.doubleValue(), 0.0);
}
@Test public void testDeltaUsingFloat() {  //test fails on 'failing assert'

    BigDecimal left = new BigDecimal("0.99").setScale(2,BigDecimal.ROUND_DOWN);
    BigDecimal right = new BigDecimal("0.979").setScale(2,BigDecimal.ROUND_DOWN);

    Assert.assertEquals(left.floatValue(), right.floatValue(), 0.09);
    Assert.assertEquals(left.floatValue(), right.floatValue(), 0.03);

    /* failing assert */ Assert.assertNotEquals(left.floatValue() + " - " + right.floatValue() + " = " + (left.floatValue() - right.floatValue()),left.floatValue(), right.floatValue(), 0.02);
    Assert.assertNotEquals(left.floatValue(), right.floatValue(), 0.01);
    Assert.assertNotEquals(left.floatValue(), right.floatValue(), 0.0);
}}

Fail Message:

java.lang.AssertionError: 0.99 - 0.97 = 0.01999998. Actual: 0.9900000095367432
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failEquals(Assert.java:185)
at org.junit.Assert.assertNotEquals(Assert.java:230)
at com.icode.common.BigDecimalTest.testDeltaUsingFloat(BigDecimalTest.java:34)

Any idea why this test fails and why I shouldn't just always use double instead of float? of course a reason other than a double is wider than a float.

Edit: The funny things is that Assert.assertNotEquals(double,double,delta) takes double in both cases so the returned floats in the failing test are getting widened as doubles anyway so why the test failure then?

Edit: May be this other question is related, not sure though: hex not the same

Edit: From the answer to this question hex not the same it can be concluded that the scientific representation IEEE 754 for .99 for float is different from double for the same value. This is due the rounding.

Hence we get this:

  • 0.99 - 0.97 = 0.01999998 //in float case
  • 0.99 - 0.97 = 0.020000000000000018 //in double case

Since the max delta in the above unit test is 0.02 and 0.01999998 (in the failing test) is below the delta value meaning that the numbers are seen to be the same but the test is asserting they are not hence failing.

Guys do you agree with all this?

回答1:

The documentation for BigDecimal is silent about how floatValue() rounds. I presume it uses round-to-nearest, ties-to-even.

left and right are set to .99 and .97, respectively. When these are converted to double in round-to-nearest mode, the results are 0.9899999999999999911182158029987476766109466552734375 (in hexadecimal floating-point, 0x1.fae147ae147aep-1) and 0.9699999999999999733546474089962430298328399658203125 (0x1.f0a3d70a3d70ap-1). When those are subtracted, the result is 0.020000000000000017763568394002504646778106689453125, which clearly exceeds .02.

When .99 and .97 are converted to float, the results are 0.9900000095367431640625 (0x1.fae148p-1) and 0.9700000286102294921875 (0x1.f0a3d8p-1). When those are subtracted, the result is 0.019999980926513671875, which is clearly less than .02.

Simply put, when a decimal numeral is converted to floating-point, the rounding may be up or down. It depends on where the number happens to lie relative to the nearest representable floating-point values. If it is not controlled or analyzed, it is practically random. Thus, sometimes you end up with a greater value than you might have expected, and sometimes you end up with a lesser value.

Using double instead of float would not guarantee that results similar to the above do not occur. It is merely happenstance that the double value in this case exceeded the exact mathematical value and the float value did not. With other numbers, it could be the other way around. For example, with double, .09-.07 is less than .02, but, with float, .09f - .07f` is greater than .02.

There is a lot of information about how to deal with floating-point arithmetic, such as Handbook of Floating-Point Arithmetic. It is too large a subject to cover in Stack Overflow questions. There are university courses on it.

Often on today’s typical processors, there is little extra expense for using double rather than float; simple scalar floating-point operations are performed at nearly the same speeds for double and float. Performance differences arise when you have so much data that the time to transfer them (from disk to memory or memory to processor) becomes important, or the space they occupy on disk becomes large, or your software uses SIMD features of processors. (SIMD allows processors to perform the same operation on multiple pieces of data, in parallel. Current processors typically provide about twice the bandwidth for float SIMD operations as for double SIMD operations or do not provide double SIMD operations at all.)



回答2:

Double can represent numbers with a larger number of significant digits, with a greater range and vice versa for float. Double computations are more costly in terms of CPU. So it all depends on your application. Binary numbers cannot exactly represent a number such as 1/5. These numbers end up being rounded, thereby introducing errors that are certainty at the origin of you failed asserts. See http://en.m.wikipedia.org/wiki/Floating_point for more details.

[EDIT] If all else fails run a benchmark:

package doublefloat;

/**
 *
 * @author tarik
 */
public class DoubleFloat {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        // TODO code application logic here
        long t1 = System.nanoTime();
        double d = 0.0;
        for (long i=0; i<1000000000;i++) {
            d = d * 1.01;
        }
        long diff1 = System.nanoTime()-t1;
        System.out.println("Double ticks: " + diff1);

        t1 = System.nanoTime();
        float f = 0.0f;
        for (long i=0; i<1000000000;i++) {
            f = f * 1.01f;
        }
        long diff2 = System.nanoTime()-t1;
        System.out.println("Float  ticks: " + diff2);
        System.out.println("Difference %: " + (diff1 - diff2) * 100.0 / diff1);    
    }
}

Output:

Double ticks: 3694029247
Float  ticks: 3355071337
Difference %: 9.175831790592209

This test was ran on a PC with an Intel Core 2 Duo. Note that since we are only dealing with a single variable in a tight loop, there is no way to overwhelm the available memory bandwidth. In fact one of the core was consistently showing 100% CPU during each run. Conclusion: The difference is 9% which might be considered negligible indeed.

Second test involves the same test but using a relatively large amount of memory 140MB and 280MB for float and double respectively:

package doublefloat;

/**
 *
 * @author tarik
 */
public class DoubleFloat {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        final int LOOPS = 70000000;
        long t1 = System.nanoTime();
        double d[] = new double[LOOPS];
        d[0] = 1.0;
        for (int i=1; i<LOOPS;i++) {
            d[i] = d[i-1] * 1.01;
        }
        long diff1 = System.nanoTime()-t1;
        System.out.println("Double ticks: " + diff1);

        t1 = System.nanoTime();
        float f[] = new float[LOOPS];
        f[0] = 1.0f;
        for (int i=1; i<LOOPS;i++) {
            f[i] = f[i-1] * 1.01f;
        }
        long diff2 = System.nanoTime()-t1;
        System.out.println("Float  ticks: " + diff2);
        System.out.println("Difference %: " + (diff1 - diff2) * 100.0 / diff1);    
    }
}

Output:

Double ticks: 667919011
Float  ticks: 349700405
Difference %: 47.64329218950769

Memory bandwidth is overwhelmed, yet I can still see the CPU peaking at 100% for a short period of time.

Conclusion: This benchmark somewhat confirms that using double takes 9% more time that float on CPU intensive applications and about 50% more time in data intensive applications. It also confirms Eric Postpischil note, that CPU overhead is relatively negligible (9%) in comparison with the performance impact of limited memory bandwidth.