Suppose I have a chunk of dynamically allocated data:
void* allocate (size_t n)
{
void* foo = malloc(n);
...
return foo;
}
I wish to use the data pointed at by foo
as a special type, type_t
. But I want to do this later, and not during allocation. In order to give the allocated data an effective type, I can therefore do something like:
void* allocate (size_t n)
{
void* foo = malloc(n);
(void) *(type_t*)foo;
...
return foo
}
As per C11 6.5/6, this lvalue access should make the effective type type_t
:
For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
However, the line (void) *(type_t*)foo;
contains no side effects, so the compiler should be free to optimize it away, and I wouldn't expect it to generate any actual machine code.
My question is: are tricks like the above safe? Does giving the data an effective type count as a side-effect? Or by optimizing away the code, will the compiler also optimize away the choice of effective type?
That is, with the above lvalue access trick, if I now call the above function like this:
int* i = allocate(sizeof(int));
*i = something;
Does this cause strict aliasing violation UB as expected, or is the effective type now int
?
The phrase from the standard that you are citing clearly only states something about the access to the object. The only changes to the effective type of the object that the standard describes are the two phrases before that, that clearly describe that you have to store into the object with the type that you want to make effective.
6.5/6
If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent accesses that do not modify
the stored value.
Nothing in the Standard would suggest that an operation which writes to an object would only need to be recognized as setting the Effective Type in cases where the operation has other side-effects as well (such as changing the pattern of bits stored in that object). On the other hand, compilers that use aggressive type-based optimization seem unable to recognize a possible change of an object's Effective Type as a side-effect which must be maintained even if the write would have no other observable side-effects.
To understand what the Effective Type rule actually says, I think it's necessary to understand where it came from. So far as I can tell, it appears to be derived from Defect Report #028, more specifically the rationale used to justify the conclusion given therein. The conclusion given is reasonable, but the rationale given is absurd.
Essentially, the basic premise involves the possibility of something like:
void actOnTwoThings(T1 *p1, T2 *p2)
{
... code that uses p1 and p2
}
...
...in some other function
union {T1 v1; T2 v2; } u;
actOnTwoThings(&u.v1, &u.v2);
Because that act of writing a union as one type and reading as another yields Implementation-Defined behavior, the behavior of writing one union member via pointer and reading another isn't fully defined by the Standard, and should therefore (by the logic of DR #028) be treated as Undefined Behavior. Although the use of p1 and p2 to access the same storage in should in fact be treated as UB in many scenarios like the above, the rationale is totally faulty. Specifying that an action yields implementation-Defined Behavior is very different from saying that it yields Undefined Behavior, especially in cases where the Standard would impose limits on what the Implementation-Defined behavior could be.
A key result of deriving pointer-type rules from the behavior of unions is that behavior is fully and unambiguously defined, with no Implementation-Defined aspects, if code writes a union any number of times using any members, in any sequence, and then reads the last member written. While requiring that implementations allow for this will block some otherwise-useful optimizations, it's pretty clear that the Effective Type rules are written to require such behavior.
A bigger problem that arising from basing type rules on the behavior of unions is that the action of reading a union using one type and writing the union with another type need not be regarded as having any side-effects if the new bit pattern matches the old. Since an implementation would have to define the new bit pattern as representing the value that was written as the new type, it would also have to define the (identical) old bit pattern as representing that same value. Given the function (assume 'long' and 'long long' are the same type):
long test(long *p1, long long *p2, void *p3)
{
if (*p1)
{
long long temp;
*p2 = 1;
temp = *(long long*)p3;
*(long*)p3 = temp;
}
return *p1;
}
both gcc and clang will decide that the write via *(long*)p3
can't have any effect since it's simply storing back the same bit pattern that had been read via *(long long*)p3
, which would be true if the following read of *p1
were going to be processed in Implementation-Defined behavior in the event the storage was written via *p2
, but isn't true if that case is regarded as UB. Unfortunately, since the Standard is inconsistent about whether behavior is Implementation-Defined or Undefined, it's inconsistent about whether the write needs to be regarded as a side-effect.
From a practical perspective, when not using -fno-strict-aliasing
, gcc and clang should be regarded as processing a dialect of C where Effective Types, once set, become permanent. They cannot reliably recognize all cases where Effective Types may be changed, and the logic necessary to handle that could easily and efficiently handle many cases which the authors of gcc have long claimed cannot possibly be handled without gutting optimization.