The following example is given in the C11 standard, 6.5.2.3
The following is not a valid fragment (because the union type is not visible within function f):
struct t1 { int m; }; struct t2 { int m; }; int f(struct t1 *p1, struct t2 *p2) { if (p1->m < 0) p2->m = -p2->m; return p1->m; } int g() { union { struct t1 s1; struct t2 s2; } u; /* ... */ return f(&u.s1, &u.s2); }
Why does it matter that the union type is visible to the function f?
In reading through the relevant section a couple times, I could not see anything in the containing section disallowing this.
It matters because of 6.5.2.3 paragraph 6 (emphasis added):
It's not an error that requires a diagnostic (a syntax error or constraint violation), but the behavior is undefined because the
m
members of thestruct t1
andstruct t2
objects occupy the same storage, but becausestruct t1
andstruct t2
are different types the compiler is permitted to assume that they don't -- specifically that changes top1->m
won't affect the value ofp2->m
. The compiler could, for example, save the value ofp1->m
in a register on first access, and then not reload it from memory on the second access.Given the declarations:
the lvalues
u.x
,up->x
,s.x
, andsp->x
are all of typeint
, but any access to any of those lvalues will (at least with the pointers initialized as shown) will also access the stored value of an object of typeunion U
orstruct S
. Since N1570 6.5p7 only allows objects of those types to be accessed via lvalues whose types are either character types, or other structs or unions that contain objects of typeunion U
andstruct S
, it would not impose any requirements about the behavior of code that attempts to use any of those lvalues.I think it's clear that the authors of the Standard intended that compilers allow objects of struct or union types to be accessed using lvalues of member type in at least some circumstances, but not necessarily that they allow arbitrary lvalues of member type to access objects of struct or union types. There is nothing normative to differentiate the circumstances where such accesses should be allowed or disallowed, but there is a footnote to suggest that the purpose of the rule is to indicate when things may or may not alias.
If one interprets the rule as only applying in cases where lvalues are used in ways that alias seemingly-unrelated lvalues of other types, such an interpretation would define the behavior of code like:
when the latter was passed a
struct s1*
,struct s2*
, orunion s1s2*
that identifies an object of its type, or the freshly-derived address of either member ofunion s1s2
. In any context where an implementation would see enough to have reason to care about whether operations on the original and derived lvalues would affect each other, it would be able to see the relationship between them.Note, however, that that such an implementation would not be required to allow for the possibility of aliasing in code like the following:
even though the Common Initial Sequence guarantee would seem to allow for that.
There are many differences between the two examples, and thus many indications that a compiler could use to allow for the realistic possibility of the first code is passed a
struct s2*
, it might accessing astruct s2
, without having to allow for the more dubious possibility that operations uponpos[]
in the second examine might affect elements ofvel[]
.Many implementations seeking to usefully support the Common Initial Sequence rule in useful fashion would be able to handle the first even if no
union
type were declared, and I don't know that the authors of the Standard intended that merely adding aunion
type declaration should force compilers to allow for the possibility of arbitrary aliasing among common initial sequences of members therein. The most natural intention I can see for mentioning union types would be that compilers which are unable to perceive any of the numerous clues present in the first example could use the presence or absence of any complete union type declaration featuring two types as an indication of whether lvalues of one such type might be used to access another.Note neither N1570 P6.5p7 nor its predecessors make any effort to describe all cases where quality implementations should behave predictably when given code that uses aggregates. Most such cases are left as Quality of Implementation issues. Since low-quality-but-conforming implementations are allowed to behave nonsensically for almost any reason they see fit, there was no perceived need to complicate the Standard with cases that anyone making a bona fide effort to write a quality implementation would handle whether or not it was required for conformance.
Note: This answer doesn't directly answer your question but I think it is relevant and is too big to go in comments.
I think the example in the code is actually correct. It's true that the union common initial sequence rule doesn't apply; but nor is there any other rule which would make this code incorrect.
The purpose of the common initial sequence rule is to guarantee the same layout of the structs. However that is not even an issue here, as the structs only contain a single
int
, and structs are not permitted to have initial padding.Note that , as discussed here, sections in ISO/IEC documents titled Note or Example are "non-normative" which means they do not actually form a part of the specification.
It has been suggested that this code violates the strict aliasing rule. Here is the rule, from C11 6.5/7:
In the example, the object being accessed (denoted by
p2->m
orp1->m
) have typeint
. The lvalue expressionsp1->m
andp2->m
have typeint
. Sinceint
is compatible withint
, there is no violation.It's true that
p2->m
means(*p2).m
, however this expression does not access*p2
. It only accesses them
.Either of the following would be undefined: