Representable result of floor() and ceil()

2020-04-05 19:46发布

问题:

For an arbitrary value 'v' of a floating point type (float/double/long double), does C89 guarantee that the mathematically exact integer result of floor(v) and ceil(v) is a representable value of the type of 'v'?

Does any of the later C or C++ standards guarantee this?

Does IEEE 754 guarantee this?

回答1:

This is guaranteed by the construction of IEEE-754 numbers. (To be clear: C does not guarantee IEEE-754, but the following analysis holds for all other floating-point formats with which I am familiar as well; the crucial property is that all sufficiently large numbers in the format are integers).


Recall that a normal IEEE-754 number has the form ±1.xxx...xxx * 2^n, where the width of the significand field (the xxx...xxx part) is defined by the type of the number (23 binary digits for single precision, 52 binary digits for double precision). All such numbers with an exponent (n) within the allowed range are representable.

Assume WLOG that v is positive (if v were negative, we could swap ceil and floor in the following analysis).

Let v have k significant bits, and write v out as a binary fixed point number; there are three possibilities:

Case 1: All significand bits are integral. When we write out v, it looks like this

xxxxxxxxxxxxxxxxxxxxxxxx000000...00000.0

then v is an integer, and so ceil(v) = floor(v) = v, and so both are trivially representable.

Case 2: All significand bits are fractional. When we write out v, it looks like

0.000000...00000xxxxxxxxxxxxxxxxxxxxxxxx

then v is in the range [0,1), and so floor(v) = 0, which is representable, and ceil(v) is either zero or one, both of which are representable.

Case 3: v contains both integral and fractional significand bits:

xxxxxxxxxxxxxx.xxxxxxxxxx

then floor(v) is just:

xxxxxxxxxxxxxx.

because we have thrown away at least one fractional bit, floor(v) has at most k-1 significant bits, and the same exponent as v, so it is representable.

If v is an integer, then ceil(v) = floor(v) = v, so ceil(v) is representable. Otherwise, ceil(v) = floor(v) + 1, and so also has at most k-1 significant bits and is also representable.