In these comments user @Deduplicator insists that the strict aliasing rule permits access through an incompatible type if either of the aliased or the aliasing pointer is a pointer-to-character type (qualified or unqualified, signed or unsigned char *
). So, his assertion is basically that both
long long foo;
char *p = (char *)&foo;
*p; // just in order to dereference 'p'
and
char foo[sizeof(long long)];
long long *p = (long long *)&foo[0];
*p; // just in order to dereference 'p'
are conforming and have defined behavior.
In my read, however, it is only the first form that is valid, that is, when the aliasing pointer is a pointer-to-char; however, one can't do that in the other direction, i. e. when the aliasing pointer points to an incompatible type (other than a character type), the aliased pointer being a char *
.
So, the second snippet above would have undefined behavior.
What's the case? Is this correct? For the record, I have already read this question and answer, and there the accepted answer explicitly states that
The rules allow an exception for
char *
. It's always assumed thatchar *
aliases other types. However this won't work the other way, there's no assumption that your struct aliases a buffer of chars.
(emphasis mine)
You are correct to say that this is not valid. As you yourself have quoted (so I shall not re-quote here) the guaranteed valid cast is only from any other type to char*.
The other form is indeed against standard and causes undefined behaviour. However as a little bonus let us discuss a little behind this standard.
Chars, on every significant architecture is the only type that allows completely unaligned access, this is due to the read byte instructions having to work on any byte, otherwise they would be all but useless. This means that an indirect read to a char will always be valid on every CPU I know of.
However the other way around this will not apply, you cannot read a uint64_t unless the pointer is aligned to 8 bytes on most arches.
However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard. Also note, if you cast a pointer to any type to a pointer to char and then cast it back the resultant pointer is guaranteed to be equal to the original object. Therefore this is ok:
And mystruct2 will equal mystruct. This also guarantees the struct is properly aligned for it's needs.
So basically, if you want a pointer to char and a pointer to another type, always declare the pointer to the other type then cast to char. Or even better use a union, that is what they are basically for...
Note, there is a notable exception to the rule however. Some old implementations of malloc used to return a char*. This pointer is always guaranteed to be castable to any type successfully without breaking aliasing rules.
Deduplicator is correct. The undefined behaviour that allows compilers to implement "strict aliasing" optimizations doesn't apply when character values are being used to produce a representation of an object.
However your second example has undefined behaviour because
foo
is uninitialized. If you initializefoo
then it only has implementation defined behaviour. It depends on the implementation defined alignment requirements oflong long
and whetherlong long
has any implementation defined pad bits.Consider if you change your second example to this:
Now alignment is no longer issue and this example is only dependent of the implementation defined representation of
long long
. What value is returned depends on the representation oflong long
but if that representation is defined as having no pad bits them this function must always return the same value and it must also always be a valid value. Without pad bits this function can't generate a trap representation, and so the compiler cannot perform any strict aliasing type optimizations on it.You have to look pretty hard to find a standard conforming implementation of C that has implementation defined pad bits in any of its integer types. I doubt you'll find one that implements any sort of strict aliasing type of optimization. In other words, compilers don't use the undefined behaviour caused by accessing a trap representation to allow strict-aliasing optimizations because no compiler that implements strict-aliasing optimizations has defined any trap representations.
Note also that had
buf
been initialized with all zeros ('\0'
characters) then this function wouldn't have any undefined or implementation defined behaviour. An all-bits-zero representation of a integer type is guaranteed not to be a trap representation and guaranteed to have the value 0.Now for a strictly conforming example that uses
char
values to create a guaranteed valid (possibly non-zero) representation of along long
value:This example has no undefined behaviour and is not dependent on the alignment or representation of
long long
. This is the sort of code that the character type exception on accessing objects was created for. In particular this means that Standard C lets you implement your ownmemcpy
function in portable C code.