Is the strict aliasing rule really a “two-way stre

2019-01-23 15:13发布

In these comments user @Deduplicator insists that the strict aliasing rule permits access through an incompatible type if either of the aliased or the aliasing pointer is a pointer-to-character type (qualified or unqualified, signed or unsigned char *). So, his assertion is basically that both

long long foo;
char *p = (char *)&foo;
*p; // just in order to dereference 'p'

and

char foo[sizeof(long long)];
long long *p = (long long *)&foo[0];
*p; // just in order to dereference 'p'

are conforming and have defined behavior.

In my read, however, it is only the first form that is valid, that is, when the aliasing pointer is a pointer-to-char; however, one can't do that in the other direction, i. e. when the aliasing pointer points to an incompatible type (other than a character type), the aliased pointer being a char *.

So, the second snippet above would have undefined behavior.

What's the case? Is this correct? For the record, I have already read this question and answer, and there the accepted answer explicitly states that

The rules allow an exception for char *. It's always assumed that char * aliases other types. However this won't work the other way, there's no assumption that your struct aliases a buffer of chars.

(emphasis mine)

2条回答
Evening l夕情丶
2楼-- · 2019-01-23 15:55

You are correct to say that this is not valid. As you yourself have quoted (so I shall not re-quote here) the guaranteed valid cast is only from any other type to char*.

The other form is indeed against standard and causes undefined behaviour. However as a little bonus let us discuss a little behind this standard.

Chars, on every significant architecture is the only type that allows completely unaligned access, this is due to the read byte instructions having to work on any byte, otherwise they would be all but useless. This means that an indirect read to a char will always be valid on every CPU I know of.

However the other way around this will not apply, you cannot read a uint64_t unless the pointer is aligned to 8 bytes on most arches.

However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard. Also note, if you cast a pointer to any type to a pointer to char and then cast it back the resultant pointer is guaranteed to be equal to the original object. Therefore this is ok:

struct x *mystruct = MakeAMyStruct();
char * foo = (char *)mystruct;
struct x *mystruct2 = (struct mystruct *)foo;

And mystruct2 will equal mystruct. This also guarantees the struct is properly aligned for it's needs.

So basically, if you want a pointer to char and a pointer to another type, always declare the pointer to the other type then cast to char. Or even better use a union, that is what they are basically for...

Note, there is a notable exception to the rule however. Some old implementations of malloc used to return a char*. This pointer is always guaranteed to be castable to any type successfully without breaking aliasing rules.

查看更多
Juvenile、少年°
3楼-- · 2019-01-23 15:55

Deduplicator is correct. The undefined behaviour that allows compilers to implement "strict aliasing" optimizations doesn't apply when character values are being used to produce a representation of an object.

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.

However your second example has undefined behaviour because foo is uninitialized. If you initialize foo then it only has implementation defined behaviour. It depends on the implementation defined alignment requirements of long long and whether long long has any implementation defined pad bits.

Consider if you change your second example to this:

long long bar() {
    char *foo = malloc(sizeof(long long));
    char c;
    for(c = 0; c < sizeof(long long); c++)
        foo[c] = c;
    long long *p = (long long *) p;
    return *p;
}

Now alignment is no longer issue and this example is only dependent of the implementation defined representation of long long. What value is returned depends on the representation of long long but if that representation is defined as having no pad bits them this function must always return the same value and it must also always be a valid value. Without pad bits this function can't generate a trap representation, and so the compiler cannot perform any strict aliasing type optimizations on it.

You have to look pretty hard to find a standard conforming implementation of C that has implementation defined pad bits in any of its integer types. I doubt you'll find one that implements any sort of strict aliasing type of optimization. In other words, compilers don't use the undefined behaviour caused by accessing a trap representation to allow strict-aliasing optimizations because no compiler that implements strict-aliasing optimizations has defined any trap representations.

Note also that had buf been initialized with all zeros ('\0' characters) then this function wouldn't have any undefined or implementation defined behaviour. An all-bits-zero representation of a integer type is guaranteed not to be a trap representation and guaranteed to have the value 0.

Now for a strictly conforming example that uses char values to create a guaranteed valid (possibly non-zero) representation of a long long value:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv) {
    int i;
    long long l;
    char *buf;

    if (argc < 2) {
        return 1;
    }
    buf = malloc(sizeof l);
    if (buf == NULL) {
        return 1;
    }
    l = strtoll(argv[1], NULL, 10);
    for (i = 0; i < sizeof l; i++) {
        buf[i] = ((char *) &l)[i];
    }
    printf("%lld\n", *(long long *)buf);
    return 0;
}

This example has no undefined behaviour and is not dependent on the alignment or representation of long long. This is the sort of code that the character type exception on accessing objects was created for. In particular this means that Standard C lets you implement your own memcpy function in portable C code.

查看更多
登录 后发表回答