Is comparing an underflowed, unsigned integer to -

2020-05-22 04:34发布

问题:

Consider the following:

size_t r = 0;
r--;
const bool result = (r == -1);

Does the comparison whose result initialises result have well-defined behaviour?
And is its result true, as I'd expect?


This Q&A was written because I was unsure of two factors in particular.
They may both be identified by use of the term "crucial[ly]" in my answer.

This example is inspired by an approach for loop conditions when the counter is unsigned:
for (size_t r = m.size() - 1; r != -1; r--)

回答1:

size_t r = 0;
r--;
const bool result = (r == -1);

Strictly speaking, the value of result is implementation-defined. In practice, it's almost certain to be true; I'd be surprised if there were an implementation where it's false.

The value of r after r-- is the value of SIZE_MAX, a macro defined in <stddef.h> / <cstddef>.

For the comparison r == -1, the usual arithmetic conversions are performed on both operands. The first step in the usual arithmetic conversions is to apply the integral promotions to both operands.

r is of type size_t, an implementation-defined unsigned integer type. -1 is an expression of type int.

On most systems, size_t is at least as wide as int. On such systems, the integral promotions cause the value of r either to be converted to unsigned int or to keep its existing type (the former can happen if size_t has the same width as int, but a lower conversion rank). Now the left operand (which is unsigned) has at least the rank of the right operand (which is signed). The right operand is converted to the type of the left operand. This conversion yields the same value as r, and so the equality comparison yields true.

That's the "normal" case.

Suppose we have an implementation where size_t is 16 bits (let's say it's a typedef for unsigned short) and int is 32 bits. So SIZE_MAX == 65535 and INT_MAX == 2147483647. Or we could have a 32-bit size_t and a 64-bit int. I doubt that any such implementation exists, but nothing in the standard forbids it (see below).

Now the left side of the comparison has type size_t and value 65535. Since signed int can represent all the values of type size_t, the integral promotions convert the value to 65535 of type int. Both side of the == operator have type int, so the usual arithmetic conversions have nothing to do. The expression is equivalent to 65535 == -1, which is clearly false.

As I mentioned, this kind of thing is unlikely to happen with an expression of type size_t -- but it can easily happen with narrower unsigned types. For example, if r is declared as an unsigned short, or an unsigned char, or even a plain char on a system where that type is signed, the value of result will probably be false. (I say probably because short or even unsigned char can have the same width as int, in which case result will be true.)

In practice, you can avoid the potential problem by doing the conversion explicitly rather than relying on the implementation-defined usual arithmetic conversions:

const bool result = (r == (size_t)-1);

or

const bool result = (r == SIZE_MAX);

C++11 standard references:

  • 5.10 [expr.eq] Equality operators
  • 5.9 [expr.rel] Relational operators (specifies that the usual arithmetic conversions are performed)
  • 5 [expr] Expressions, paragraph 9: Usual arithmetic conversions
  • 4.5 [conv.prom] Integral promotions
  • 18.2 [support.types] size_t

18.2 paragraphs 6-7:

6 The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.

7 [ Note: It is recommended that implementations choose types for ptrdiff_t and size_t whose integer conversion ranks (4.13) are no greater than that of signed long int unless a larger size is necessary to contain all the possible values. — end note ]

So there's no prohibition on making size_t narrower than int. I can almost plausibly imagine a system where int is 64 bits, but no single object can be bigger than 232-1 bytes so size_t is 32 bits.



回答2:

Yes, and the result is what you would expect.

Let's break it down.

What is the value of r at this point? Well, the underflow is well-defined and results in r taking on its maximum value by the time the comparison is run. std::size_t has no specific known bounds, but we can make reasonable assumptions about its range when compared to that of an int:

std::size_t is the unsigned integer type of the result of the sizeof operator. [..] std::size_t can store the maximum size of a theoretically possible object of any type (including array).

And, just to get it out of the way, the expression -1 is unary - applied to the literal 1, and has type int on any system:

[C++11: 2.14.2/2]: The type of an integer literal is the first of the corresponding list in Table 6 in which its value can be represented. [..]

(I won't cite all the text that describes how applying unary - to an int results in an int, but it does.)

It's more than reasonable to suggest that, on the majority of systems, an int is not going to be able to hold std::numeric_limits<std::size_t>::max().

Now, what happens to those operands?

[C++11: 5.10/1]: The == (equal to) and the != (not equal to) operators have the same semantic restrictions, conversions, and result type as the relational operators except for their lower precedence and truth-value result. [..]

[C++11: 5.9/2]: The usual arithmetic conversions are performed on operands of arithmetic or enumeration type. [..]

Let's examine these "usual arithmetic conversions":

[C++11: 5/9]: Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result.

This pattern is called the usual arithmetic conversions, which are defined as follows:

  • If either operand is of scoped enumeration type (7.2), no conversions are performed; if the other operand does not have the same type, the expression is ill-formed.
  • If either operand is of type long double, the other shall be converted to long double`.
  • Otherwise, if either operand is double, the other shall be converted to double.
  • Otherwise, if either operand is float, the other shall be converted to float.
  • Otherwise, the integral promotions (4.5) shall be performed on both operands.59 Then the following rules shall be applied to the promoted operands:
    • If both operands have the same type, no further conversion is needed.
    • Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
    • Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
    • Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
    • Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

I've highlighted the passage that takes effect here and, as for why:

[C++11: 4.13/1]: Every integer type has an integer conversion rank defined as follows

  • [..]
  • The rank of long long int shall be greater than the rank of long int, which shall be greater than the rank of int, which shall be greater than the rank of short int, which shall be greater than the rank of signed char.
  • The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type.
  • [..]

All integral types, even the fixed-width ones, are composed of the standard integral types; therefore, logically, std::size_t must be unsigned long long, unsigned long, or unsigned int.

  • If std::size_t is unsigned long long, or unsigned long, then the rank of std::size_t is greater than the rank of unsigned int and, therefore, also of int.

  • If std::size_t is unsigned int, the rank of std::size_t is equal to the rank of unsigned int and, therefore, also of int.

Either way, per the usual arithmetic conversions, the signed operand is converted to the type of the unsigned operand (and, crucially, not the other way around!). Now, what does this conversion entail?

[C++11: 4.7/2]: If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]

[C++11: 4.7/3]: If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

This means that std::size_t(-1) is equivalent to std::numeric_limits<std::size_t>::max(); it's crucial that the value n in the above clause relates to the number of bits used to represent the unsigned type, not the source type. Otherwise, we'd be doing std::size_t((unsigned int)-1), which is not the same thing at all — it could be many orders of magnitude smaller than our desired value!

Indeed, now that we know the conversions are all well-defined, we can test this value:

std::cout << (std::size_t(-1) == std::numeric_limits<size_t>::max()) << '\n';
// "1"

And, just to illustrate my point from earlier, on my 64-bit system:

std::cout << std::is_same<unsigned long, std::size_t>::value << '\n';
std::cout << std::is_same<unsigned long, unsigned int>::value << '\n';
std::cout << std::hex << std::showbase
          << std::size_t(-1) << ' '
          << std::size_t(static_cast<unsigned int>(-1)) << '\n';
// "1"
// "0"
// "0xffffffffffffffff 0xffffffff"