The code below for testing endianness is expected to have implementation defined behavior:
int is_little_endian(void) {
int x = 1;
char *p = (char*)&x;
return *p == 1;
}
But is it possible that it may have undefined behavior on purposely contrived architectures? For example could the first byte of the representation of an int
with value 1
(or another well chosen value) be a trap value for the char
type?
As noted in comments, the type unsigned char
would not have this issue as it cannot have trap values, but this question specifically concerns the char
type.
Per C 2018 6.2.5 15, char
behaves as either signed char
or unsigned char
. Suppose it is signed char
. 6.2.6.2 2 discusses signed integer types, including signed char
. At the end of this paragraph, it says:
Which of these [sign and magnitude, two’s complement, or ones’ complement] applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value.
Thus, this paragraph allows signed char
to have a trap representation. However, the paragraph in the standard that says accessing trap representations may have undefined behavior, 6.2.6.1 5, specifically excludes character types:
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.
Thus, although char
may have trap representations, there is no reason we should not be able to access it. There is then the question of what happens if we use the value in an expression? If a char
has a trap representation, it does not represent a value. So attempting to compare it to 1 in *p == 1
does not seem to have a defined behavior.
The specific value of 1 in an int
will not result in a trap representation in char
for any normal C implementation, as the 1 will be in the “rightmost” (lowest valued) bit of some byte of the int
, and no normal C implementation puts the sign bit of a char
in the bit in that position. However, the C standard apparently does not prohibit such an arrangement, so, theoretically, an int
with value 1 might be encoded with bits 00000001 in one of its bytes, and those bits might be a trap representation for a char
.
I don't think the Standard would forbid an implementation in which signed char
used sign-magnitude or ones'-complement format, and trapped on attempts to load the bit pattern that would represent "negative zero". Nor does it require that such implementations must make char
unsigned. It would be possible to contrive an architecture upon which your code could have arbitrary behavior. A few more important things to note, however:
There is no guarantee that the bits within a char
are mapped in the same sequence as the ones in an int
. The code wouldn't launch into UB-land if the bits aren't mapped in order, but the result would not be very meaningful.
So far as I can tell, every non-contrived conforming C99 implementation has used two's-complement format; I consider it doubtful that any will ever do otherwise.
It would be silly for an implementation to make char
be a type with fewer representable values than bit patterns.
One could contrive a conforming implementation that would do almost anything with almost any source text, provided that there exists some source text that it would process in the fashion defined by the Standard.
One could contrive a conforming sign-magnitude implementation where the integer value 1 would have a bit pattern that would encode signed char value "negative zero", and which would trap on an attempt to load that. One could even contrive a conforming ones'-complement implementation that did so (have lots of padding bits on the "int" type, all of which get set when storing the value "1"). Given that one could contrive a conforming implementation that uses the One Program rule to justify doing anything it liked with the above source text regardless of what integer format it uses, however, I don't think the possibility of weird char
type should really be a worry.
Note, btw, that the Standard makes no effort to forbid silly implementations; it might be improved by adding language mandating that char
must either be a two's-complement type with no trap representations or an unsigned type, and either mandating the same for signed char
or explicitly saying that is not required. It might also be improved if it recognized a category of implementations which can't support types like unsigned long long
[which would be a major stumbling block for a 36-bit ones'-complement systems, and may be the reason that no conforming C99 implementations exist for such platforms].
I found a quote from the Standard that proves that no object representation is a trap value for unsigned char
:
6.2.6.2 Integer types
1 For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits (there need
not be any of the latter). If there are N value bits, each bit shall represent a different
power of 2 between 1 and 2N−1, so that objects of that type shall be capable of
representing values from 0 to 2N − 1 using a pure binary representation; this shall be
known as the value representation. The values of any padding bits are unspecified.53)
The previous says that an unsigned char
cannot have any padding bits.
The following footnote says that padding bits are what can be used for trap representations.
53) Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap
representation other than as part of an exceptional condition such as an overflow, and this cannot occur
with unsigned types. All other combinations of padding bits are alternative object representations of
the value specified by the value bits.
So I guess the answer is that char
is not guaranteed to not have any trap values but unsigned char
is.