For example, is this code valid, or does it invoke undefined behavior by violating the aliasing rules?
int x;
struct s { int i; } y;
x = 1;
y = *(struct s *)&x;
printf("%d\n", y.i);
My interest is in using a technique based on this to develop a portable method for performing aliased reads.
Update: here is the intended usage case, a little bit different, but it should be valid if and only if the above is valid:
static inline uint32_t read32(const unsigned char *p)
{
struct a { char r[4]; };
union b { struct a r; uint32_t x; } tmp;
tmp.r = *(struct a *)p;
return tmp.x;
}
GCC, as desired, compiles this to a single 32-bit load, and it seems to avoid the aliasing issues that could happen if p
actually points to a type other than char
. In other words, it seems to act as a portable replacement for the GNU C __attribute__((__may_alias__))
attribute. But I'm uncertain whether it's really well-defined...
From the C standard:
The resulting pointer in this case is guaranteed to be correctly aligned (because the first member of a struct must be coincident with the struct), so this limitation doesn't apply here. What does apply is additional restrictions on pointer use requiring that access to an object is only via pointers compatible with the "effective type" of the object ... in this case, the effective type of
x
isint
and so it cannot be accessed via a struct pointer.Note that, contrary to some claims, the conversion between pointer types is not limited to round trip use. The standard says that the pointer can be converted, with a proviso as to when such conversions result in undefined behavior. Elsewhere it gives the semantics of the use of pointers of the resulting type. The round-trip guarantees in the standard are additional specifications ... things that you can count on that you could not if not explicitly stated:
This specifies a guarantee about the round trip, it is not a limitation to a round trip.
However, as noted, the "effective type" language is a limitation on the use of the pointer resulting from a conversion.
My reading of aliasing rules (C99, 6.5p7) with the presence of this sentence:
leads to me think it does not violate the C aliasing rules.
But the fact it does not violate aliasing rules is not enough for this code snippet to be valid. It may invoked undefined behavior for other reasons.
is not guaranteed to point to a valid
struct s
object. Even if we assume the alignment ofx
is suitable for an object of typestruct
, the resulting pointer after the cast may not point to a space large enough to hold the structure object (as struct s may have padding after its last member).EDIT: the answer has been completely reworked from its initial version
In your second example
this structure type might have some alignment restrictions. The compiler might decide that
struct a
is always 4 byte aligned, e.g, such that it always can use a 4 byte aligned read instruction, without looking at the actual address. The pointerp
that you receive as an argument toread32
has no such restriction, somight cause a bus error.
I notice that this type of argument is a "practical" one.
In point of view of the standard this is UB as soon as
(struct a*)p
is a conversion to a type with more restrictive alignment requirements.I believe this will still violate effective typing rules. You want to access a memory location that wasn't declared explicitly (or implicitly via storage in case of dynamic allocation) as containing a
struct a
through an expression of that type.None of the sections that have been quoted in other answers can be used to escape this basic restriction.
However, I believe there's a solution to your problem: Use
__builtin_memcpy()
, which is available even in freestanding environments (see the manual entry on-fno-builtin
).Note that the issue is a bit less clear-cut than I make it sound. C11 section 6.5 §7 tells us that it's fine to access an object through an lvalue expression that has an aggregate or union type that includes one of the aforementioned types among its members.
The C99 rationale makes it clear that this restriction is there so a pointer to an aggregate and a pointer to one of its members may alias.
I believe the ability to use this loophole in the way of the first example (but not the second one, assuming
p
doesn't happen to point to an actualchar [4]
) is an unintended consequence, which the standard only fails to disallow because of imprecise wording.Also note that if the first example were valid, we'd basically be able to sneak in structural typing into an otherwise nominally typed language. Structures in a union with common initial subsequence aside (and even then, member names do matter), an identical memory layout is not enough to make types compatible. I believe the same reasoning applies here.
Not sure it's a proper answer, but what could happen (in your second example) is this:
struct a
as an 8-byte object, with padding after the 4 bytes in the array (why? because it can).tmp.r = *(struct a *)p;
which treats p as an address of astruct a
(namely, an 8 byte object). It tries to copy the contents of this object intotmp.r
, that is, 8 bytes from the address thatp
is holding. But you're only allowed to read 4 bytes from there.Implementations do not have to copy padding bytes, but they're allowed to do so.