Why double can store bigger numbers than unsigned

2019-01-23 16:54发布

The question is, I don't quite get why double can store bigger numbers than unsigned long long. Since both of them are 8 bytes long, so 64 bits.

Where in unsigned long long, all 64 bits are used in order to store a value, on the other hand double has 1 for sign, 11 for exponent and 52 for mantissa. Even if 52 bits, which are used for mantissa, will be used in order to store decimal numbers without floating point, it still has 63 bits ...

BUT LLONG_MAX is significantly smaller than DBL_MAX ...

Why?

6条回答
SAY GOODBYE
2楼-- · 2019-01-23 17:24

The reason is that unsigned long long will store exact integers whereas double stores a mantissa (with limited 52-bit precision) and an exponent.

This allows double to store very large numbers (around 10308) but not exactly. You have about 15 (almost 16) valid decimal digits in a double, and the rest of the 308 possible decimals are zeroes (actually undefined, but you can assume "zero" for better understanding).
An unsigned long long only has 19 digits, but every single of them is exactly defined.

EDIT:
In reply to below comment "how does this exactly work", you have 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. The mantissa has an implied "1" bit at the beginning, which is not stored, so effectively you have 53 mantissa bits. 253 is 9.007E15, so you have 15, almost 16 decimal digits to work with.
The exponent has a sign bit, and can range from -1022 to +1023, which is used to scale (binary shift left or right) the mantissa (21023 is around 10307, hence the limits on range), so very small and very large numbers are equally possible with this format.
But, of course, all numbers that you can represent only have as much precision as will fit into the matissa.

All in all, floating point numbers are not very intuitive, since "easy" decimal numbers are not necessarily representable as floating point numbers at all. This is due to the fact that the mantissa is binary. For example, it is possible (and easy) to represent any positive integer up to a few billion, or numbers like 0.5 or 0.25 or 0.0125, with perfect precision.
On the other hand, it is also possible to represent a number like 10250, but only approximately. In fact, you will find that 10250 and 10250+1 are the same number (wait, what???). That is because although you can easily have 250 digits, you do not have that many significant digits (read "significant" as "known" or "defined").
Also, representing something seemingly simple like 0.3 is also only possible approximately, even though 0.3 isn't even a "big" number. However, you can't represent 0.3 in binary, and no matter what binary exponent you attach to it, you will not find any binary number that results in exactly 0.3 (but you can get very close).

Some "special values" are reserved for "infinity" (both positive and negative) as well as "not a number", so you have very slightly less than the total theoretical range.

unsigned long long on the other hand, does not interprete the bit pattern in any way. All numbers that you can represent are simply the exact number that is represented by the bit pattern. Every digit of every number is exactly defined, no scaling happens.

查看更多
混吃等死
3楼-- · 2019-01-23 17:28

Perhaps you feel that "storing a number in N bits" is something fundamental, whereas there are various ways of doing it. In fact, it is more accurate to say we represent a number in N bits, as the meaning depends on what convention we adopt. We can, in principle, adopt any convention we like for which numbers different N-bit patterns represent. There is the binary convention, as used for unsigned long long and other integer types, and the mantissa+exponent convention as used for double, but we could also define an (absurd) convention of our own, in which, for example, all bits zero means any enormous number you care to specify. In practice we usually use conventions which allow us to combine (add, multiply, etc.) numbers efficiently using the hardware on which we run our programmes.

That said, your question has to be answered by comparing the largest binary N-bit number with the largest number of the form 2^exponent * mantissa, where exponent mantissa are E- and M-bit binary numbers (with an implicit 1 at the start of the mantissa). That is 2^(2^E-1) * (2^M - 1), which is typically indeed far greater than 2^N - 1.

查看更多
聊天终结者
4楼-- · 2019-01-23 17:33

A small example of Damon and Paxdiablo explanations:

#include <stdio.h>

int main(void) {
    double d = 2LL<<52;
    long long ll = 2LL<<52;
    printf("d:%.0f  ll:%lld\n", d, ll);
    d++; ll++;
    printf("d:%.0f  ll:%lld\n", d, ll);
}

Output:

d:72057594037927936  ll:72057594037927936
d:72057594037927936  ll:72057594037927937

Both variables would have been incremented the same way with a shift of 51 or less.

查看更多
太酷不给撩
5楼-- · 2019-01-23 17:34

What kind of magic is happening ???

The same kind of magic that allows you to represent the 101-digit number

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 

as

1.0 * 10100

It's just instead of base 10 you're doing it in base 2:

0.57149369564113749110789177415267 * 2333.

This notation allows you to represent very large (or very small) values in a compact manner. Instead of storing every digit, you store the significand (a.k.a. the mantissa or fraction) and the exponent. This way, numbers that are hundreds of decimal digits long can be represented in a format that takes up only 64 bits.

It is the exponent that allows floating-point numbers to represent such a large range of values. The exponent value 1024 only requires 10 bits to store, but 21024 is a 308-digit number.

The tradeoff is that not every value can be represented exactly. With a 64-bit integer, every value between 0 and 264-1 (or -263 to 263-1) has an exact representation. That is not true of floating-point numbers for several reasons. First of all, you only have so many bits, giving you only so many digits of precision. For example, if you only have 3 significant digits, then you cannot represent values between 0.123 and 0.124, or 1.23 and 1.24, or 123 and 124, or 1230000 and 1240000. As you approach the edge of your range, the gap between representable values gets larger.

Secondly, just like there are values that cannot be represented in a finite number of digits (3/10 gives the non-terminating sequence 0.33333...10), there are values that cannot be represented in a finite number of bits (1/10 gives the non-terminating sequence 1.100110011001...2).

查看更多
叼着烟拽天下
6楼-- · 2019-01-23 17:39

IEEE754 floating point values can store a larger range of numbers simply because they sacrifice precision.

By that, I mean that a 64-bit integral type can represent every single value in its range but a 64-bit double cannot.

For example, trying to store 0.1 into a double won't actually give you 0.1, it'll give you something like:

0.100000001490116119384765625

(that's actually the nearest single precision value but the same effect will apply for double precision).


But, if the question is "how do you get a larger range with fewer bits available to you?", it's simply that some of those bits are used to scale the value.

Classic example, let's say you have four decimal digits to store a value. With an integer, you can represent the numbers 0000 through 9999 inclusive. The precision within that range is perfect, you can represent every integral value.

However, let's go floating point and use the last digit as a scale so that the digits 1234 actually represent the number 123 x 104.

So now your range is from 0 (represented by 0000 through 0009) through 999,000,000,000 (represented by 9999 being 999 x 109).

But you cannot represent every number within that range. For example, 123,456 cannot be represented, the closet you can get is with the digits 1233 which give you 123,000. And, in fact, where the integer values had a precision of four digits, now you only have three.

That's basically how IEEE754 works, sacrificing precision for range.

查看更多
走好不送
7楼-- · 2019-01-23 17:39

Disclaimer

This is an attempt to provide an easy to understand explanation about how the floating point encoding works. It is a simplification and it does not cover any of the technical aspects of the real IEEE 754 floating point standard (normalization, signed zero, infinities, NaNs, rounding etc). However, the idea presented here is correct.


Understanding how the floating point numbers work is severely impeded by the fact that computers work with numbers in base 2 while the humans don't easily handle them. I'll try to explain how the floating point numbers work using base 10.

Let's construct a floating point number representation using signs and base 10 digits (i.e. the usual digits from 0 to 9 we are using on a daily basis).

Let's say we have 10 square cells and each cell can hold either a sign (+ or -) or a decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8 or 9).

We can use the 10 digits to store signed integer numbers. One digit for the sign and 9 digits for the value:

sign -+   +-------- 9 decimal digits -----+
      v   v                               v
    +---+---+---+---+---+---+---+---+---+---+
    | + | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 0 | 0 |
    +---+---+---+---+---+---+---+---+---+---+

This is how value 1500 is represented as an integer.

We can also use them to store floating point numbers. For example, 7 digits for mantissa and 3 digits for exponent:

  +------ sign digits --------+
  v                           v
+---+---+---+---+---+---+---+---+---+---+
| + | 0 | 0 | 0 | 1 | 5 | 0 | + | 0 | 1 |
+---+---+---+---+---+---+---+---+---+---+
|<-------- Mantissa ------->|<-- Exp -->|       

This is one of the possible representations of 1500 as floating point value (using our 10 decimal digits representation).

The value of mantissa (M) is +150, the value of exponent (E) is +1. The value represented above is:

V = M * 10^E = 150 * 10^1 = 1500

The ranges

The integer representation can store signed values between -(10^9-1) (-999,999,999) and +(10^9-1) (+999,999,999). More, it can represent each and every integer value between these limits. Even more, there is a single representation for each value and it is exact.

The floating point representation can store signed values for mantissa (M) between -999,999 and +999,999 and for exponent (E) between -99 and +99.

It can store values between -999,999*10^99 and +999,999*10^99. These numbers have 105 digits, much more than the 9 digits of the biggest numbers represented as integers above.

The loose of precision

Let's remark that for integer values, M stores the sign and the first 6 digits of the value (or less) and E is the number of digits that did not fit into M.

V = M * 10^E

Let's try to represent V = +987,654,321 using our floating point encoding.

Because M is limited to +999,999 it can only store +987,654 and E will be +3 (the last 3 digits of V cannot fit in M).

Putting them together:

+987,654 * 10^(+3) = +987,654,000

This is not our original value of V but the best approximation we can get using this representation.

Let's remark that all the numbers between (and including) +987,654,000 and +987,654,999 are approximated using the same value (M=+987,654, E=+3). Also there is no way to store decimal digits for numbers greater than +999,999.

As a general rule, for numbers bigger than the maximum value of M (+999.999), this method produces the same representation for all values between +999,999*10^E and +999,999*10^(E+1)-1 (integer or real values, it doesn't matter).

Conclusion

For large values (larger than the maximum value of M), the floating point representation has gaps between the numbers it can represent. These gaps become bigger and bigger as the value of E increases.

The entire idea of the "floating point" is to store a dozen or so of the most representative digits (the beginning of the number) and the magnitude of the number.

Let's take the speed of light as an example. Its value is about 300,000 km/s. Being so massive, for most practical purposes you don't care if it's 300,000.001 km/s or 300,000.326 km/s.

In fact, it is not even that big, a better approximation is 299,792.458 km/s.

The floating point numbers extract the important characteristics of the speed of light: its magnitude is of hundreds of thousands of km/s (E=5) and its value is 3 (hundred of thousands km/s).

speed of light = 3*10^5 km/s

Our floating point representation can approximate it by: 299,792 km/s (M=299,792, E=0).

查看更多
登录 后发表回答