Float and double datatype in Java

2019-01-02 14:45发布

问题:

The float data type is a single-precision 32-bit IEEE 754 floating point and the double data type is a double-precision 64-bit IEEE 754 floating point.

What does it mean? And when should I use float instead of double or vice-versa?

回答1:

The Wikipedia page on it is a good place to start.

To sum up:

  • float is represented in 32 bits, with 1 sign bit, 8 bits of exponent, and 23 bits of the significand (or what follows from a scientific-notation number: 2.33728*1012; 33728 is the significand).

  • double is represented in 64 bits, with 1 sign bit, 11 bits of exponent, and 52 bits of significand.

By default, Java uses double to represent its floating-point numerals (so a literal 3.14 is typed double). It's also the data type that will give you a much larger number range, so I would strongly encourage its use over float.

There may be certain libraries that actually force your usage of float, but in general - unless you can guarantee that your result will be small enough to fit in float's prescribed range, then it's best to opt with double.

If you require accuracy - for instance, you can't have a decimal value that is inaccurate (like 1/10 + 2/10), or you're doing anything with currency (for example, representing $10.33 in the system), then use a BigDecimal, which can support an arbitrary amount of precision and handle situations like that elegantly.



回答2:

A float gives you approx. 6-7 decimal digits precision while a double gives you approx. 15-16. Also the range of numbers is larger for double.

A double needs 8 bytes of storage space while a float needs just 4 bytes.



回答3:

Floating-point numbers, also known as real numbers, are used when evaluating expressions that require fractional precision. For example, calculations such as square root, or transcendentals such as sine and cosine, result in a value whose precision requires a floating-point type. Java implements the standard (IEEE–754) set of floatingpoint types and operators. There are two kinds of floating-point types, float and double, which represent single- and double-precision numbers, respectively. Their width and ranges are shown here:


   Name     Width in Bits   Range 
    double  64              1 .7e–308 to 1.7e+308
    float   32              3 .4e–038 to 3.4e+038


float

The type float specifies a single-precision value that uses 32 bits of storage. Single precision is faster on some processors and takes half as much space as double precision, but will become imprecise when the values are either very large or very small. Variables of type float are useful when you need a fractional component, but don't require a large degree of precision.

Here are some example float variable declarations:

float hightemp, lowtemp;


double

Double precision, as denoted by the double keyword, uses 64 bits to store a value. Double precision is actually faster than single precision on some modern processors that have been optimized for high-speed mathematical calculations. All transcendental math functions, such as sin( ), cos( ), and sqrt( ), return double values. When you need to maintain accuracy over many iterative calculations, or are manipulating large-valued numbers, double is the best choice.



回答4:

Java seems to have a bias towards using double for computations nonetheless:

Case in point the program I wrote earlier today, the methods didn't work when I used float, but now work great when I substituted float with double (in the NetBeans IDE):

package palettedos;
import java.util.*;

class Palettedos{
    private static Scanner Z = new Scanner(System.in);
    public static final double pi = 3.142;

    public static void main(String[]args){
        Palettedos A = new Palettedos();
        System.out.println("Enter the base and height of the triangle respectively");
        int base = Z.nextInt();
        int height = Z.nextInt();
        System.out.println("Enter the radius of the circle");
        int radius = Z.nextInt();
        System.out.println("Enter the length of the square");
        long length = Z.nextInt();
        double tArea = A.calculateArea(base, height);
        double cArea = A.calculateArea(radius);
        long sqArea = A.calculateArea(length);
        System.out.println("The area of the triangle is\t" + tArea);
        System.out.println("The area of the circle is\t" + cArea);
        System.out.println("The area of the square is\t" + sqArea);
    }

    double calculateArea(int base, int height){
        double triArea = 0.5*base*height;
        return triArea;
    }

    double calculateArea(int radius){
        double circArea = pi*radius*radius;
        return circArea;
    }

    long calculateArea(long length){
        long squaArea = length*length;
        return squaArea;
    }
}


回答5:

According to the IEEE standards, float is a 32 bit representation of a real number while double is a 64 bit representation.

In Java programs we normally mostly see the use of double data type. It's just to avoid overflows as the range of numbers that can be accommodated using the double data type is more that the range when float is used.

Also when high precision is required, the use of double is encouraged. Few library methods that were implemented a long time ago still requires the use of float data type as a must (that is only because it was implemented using float, nothing else!).

But if you are certain that your program requires small numbers and an overflow won't occur with your use of float, then the use of float will largely improve your space complexity as floats require half the memory as required by double.



回答6:

This example illustrates how to extract the sign (the leftmost bit), exponent (the 8 following bits) and mantissa (the 23 rightmost bits) from a float in Java.

int bits = Float.floatToIntBits(-0.005f);
int sign = bits >>> 31;
int exp = (bits >>> 23 & ((1 << 8) - 1)) - ((1 << 7) - 1);
int mantissa = bits & ((1 << 23) - 1);
System.out.println(sign + " " + exp + " " + mantissa + " " +
  Float.intBitsToFloat((sign << 31) | (exp + ((1 << 7) - 1)) << 23 | mantissa));

The same approach can be used for double’s (11 bit exponent and 52 bit mantissa).

long bits = Double.doubleToLongBits(-0.005);
long sign = bits >>> 63;
long exp = (bits >>> 52 & ((1 << 11) - 1)) - ((1 << 10) - 1);
long mantissa = bits & ((1L << 52) - 1);
System.out.println(sign + " " + exp + " " + mantissa + " " +
  Double.longBitsToDouble((sign << 63) | (exp + ((1 << 10) - 1)) << 52 | mantissa));

Credit: http://s-j.github.io/java-float/



回答7:

This will give error:

public class MyClass {
    public static void main(String args[]) {
        float a = 0.5;
    }
}

/MyClass.java:3: error: incompatible types: possible lossy conversion from double to float float a = 0.5;

This will work perfectly fine

public class MyClass {
    public static void main(String args[]) {
        double a = 0.5;
    }
}

This will also work perfectly fine

public class MyClass {
    public static void main(String args[]) {
        float a = (float)0.5;
    }
}

Reason : Java by default stores real numbers as double to ensure higher precision.

Double takes more space but more precise during computation and float takes less space but less precise.



标签: