Does accessing an int with a char * potentially ha

2020-08-18 02:56发布

问题:

The code below for testing endianness is expected to have implementation defined behavior:

int is_little_endian(void) {
    int x = 1;
    char *p = (char*)&x;
    return *p == 1;
}

But is it possible that it may have undefined behavior on purposely contrived architectures? For example could the first byte of the representation of an int with value 1 (or another well chosen value) be a trap value for the char type?

As noted in comments, the type unsigned char would not have this issue as it cannot have trap values, but this question specifically concerns the char type.

回答1:

Per C 2018 6.2.5 15, char behaves as either signed char or unsigned char. Suppose it is signed char. 6.2.6.2 2 discusses signed integer types, including signed char. At the end of this paragraph, it says:

Which of these [sign and magnitude, two’s complement, or ones’ complement] applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value.

Thus, this paragraph allows signed char to have a trap representation. However, the paragraph in the standard that says accessing trap representations may have undefined behavior, 6.2.6.1 5, specifically excludes character types:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

Thus, although char may have trap representations, there is no reason we should not be able to access it. There is then the question of what happens if we use the value in an expression? If a char has a trap representation, it does not represent a value. So attempting to compare it to 1 in *p == 1 does not seem to have a defined behavior.

The specific value of 1 in an int will not result in a trap representation in char for any normal C implementation, as the 1 will be in the “rightmost” (lowest valued) bit of some byte of the int, and no normal C implementation puts the sign bit of a char in the bit in that position. However, the C standard apparently does not prohibit such an arrangement, so, theoretically, an int with value 1 might be encoded with bits 00000001 in one of its bytes, and those bits might be a trap representation for a char.



回答2:

I don't think the Standard would forbid an implementation in which signed char used sign-magnitude or ones'-complement format, and trapped on attempts to load the bit pattern that would represent "negative zero". Nor does it require that such implementations must make char unsigned. It would be possible to contrive an architecture upon which your code could have arbitrary behavior. A few more important things to note, however:

  1. There is no guarantee that the bits within a char are mapped in the same sequence as the ones in an int. The code wouldn't launch into UB-land if the bits aren't mapped in order, but the result would not be very meaningful.

  2. So far as I can tell, every non-contrived conforming C99 implementation has used two's-complement format; I consider it doubtful that any will ever do otherwise.

  3. It would be silly for an implementation to make char be a type with fewer representable values than bit patterns.

  4. One could contrive a conforming implementation that would do almost anything with almost any source text, provided that there exists some source text that it would process in the fashion defined by the Standard.

One could contrive a conforming sign-magnitude implementation where the integer value 1 would have a bit pattern that would encode signed char value "negative zero", and which would trap on an attempt to load that. One could even contrive a conforming ones'-complement implementation that did so (have lots of padding bits on the "int" type, all of which get set when storing the value "1"). Given that one could contrive a conforming implementation that uses the One Program rule to justify doing anything it liked with the above source text regardless of what integer format it uses, however, I don't think the possibility of weird char type should really be a worry.

Note, btw, that the Standard makes no effort to forbid silly implementations; it might be improved by adding language mandating that char must either be a two's-complement type with no trap representations or an unsigned type, and either mandating the same for signed char or explicitly saying that is not required. It might also be improved if it recognized a category of implementations which can't support types like unsigned long long [which would be a major stumbling block for a 36-bit ones'-complement systems, and may be the reason that no conforming C99 implementations exist for such platforms].



回答3:

I found a quote from the Standard that proves that no object representation is a trap value for unsigned char:

6.2.6.2 Integer types

1 For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N−1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.53)

The previous says that an unsigned char cannot have any padding bits.

The following footnote says that padding bits are what can be used for trap representations.

53) Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types. All other combinations of padding bits are alternative object representations of the value specified by the value bits.

So I guess the answer is that char is not guaranteed to not have any trap values but unsigned char is.