Is it safe to detect endianess with union?

2019-02-16 13:59发布

问题:

In other words, according to the C standard, is this code safe? (Assume uint8_t is one byte)

void detectEndianness(void){
    union {
        uint16_t w;
        uint8_t b;
    } a;
    a.w = 0x00FFU;
    if (a.b == 0xFFU) {
        puts("Little endian.");
    }
    else if (a.b == 0U) {
        puts("Big endian.");
    }
    else {
        puts("Stack Overflow endian.");
    }
}

What if I change it into this? Note the third if case that I'm aware of.

a.w = 1U;
if (a.b == 1U) { puts("Little endian."); }
else if (a.b == 0U) { puts ("Big endian."); }
else if (a.b == 0x80U) { /* Special potential */ }
else { puts("Stack Overflow endian."); }

回答1:

Quoting from n1570:

6.5.2.3 Structure and union members - p3

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, and is an lvalue if the first expression is an lvalue.

6.2.6 Representations of types / 1 General - p7

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

It's allowed. And your use case could even be considered one intended purpose, if note 95 is taken into account (despite being only informative):

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Now, since the uintN_t family of types are defined to have no padding bits

7.20.1.1 Exact-width integer types - p2

The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

All their bit representations are valid values, no trap representations are possible. So we must conclude that it will indeed check for the endianess of uint16_t.



回答2:

The standard (available in the linked online draft) says in a footnote that it is allowed to access a different member of the same union than the member previously written:

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ''type punning''). This might be a trap representation.

But the footnote also mentions a possible trap representation, and the only data type that is guaranteed by the standard to be safe concerning trap representations is unsigned char. Accessing trap representations may be undefined behaviour; and although I don't think that unit_32 may yield a trap representation on your platform, it is actually implementation dependant whether accessing this member is UB or not.



回答3:

There is no requirement that the order of bits within a byte match the ordering of the corresponding bits in a larger type. A conforming implementation which defines uint32_t and has an 8-bit unsigned char could, for example, store the upper 16 bits of the uint32_t using four bits from each byte, and store the bottom 16 bits using the remaining four bits of each byte. From the point of view of the Standard, any of 32! permutations of bits would be equally acceptable.

That having been said, any implementation that isn't being deliberately obtuse and is designed to run on a commonplace platform will use one of two orderings [treating bytes as groups of 8 consecutive bits, in the order 0123 or 3210], and one that doesn't use one of the above and targets any platform that isn't totally obscure will use 2301 or 1032. The Standard doesn't forbid other orderings, but failure to accommodate them would be very unlikely to cause any trouble except when using obtusely-contrived implementations.