C/C++ strict aliasing, object lifetime and modern

2019-02-12 22:32发布

问题:

I am facing confusion about the C++ strict-aliasing rule and its possible implications. Consider the following code:

int main() {
  int32_t a = 5;
  float* f = (float*)(&a);
  *f = 1.0f;

  int32_t b = a;   // Probably not well-defined?
  float g = *f;    // What about this?
}

Looking at the C++ specs, section 3.10.10, technically none of the given code seems to violate the "aliasing-rules" given there:

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:
... a list of qualified accessor types ...

  • *f = 1.0f; doesn't break the rules because there is no access to a stored value, i.e. I am just writing to memory through a pointer. I'm not reading from memory or trying to interpret a value here.
  • The line int32_t b = a; doesn't violate the rules because I am accessing through its original type.
  • The line float g = *f; doesn't break the rules for just the same reason.

In another thread, member CortAmmon actually makes the same point in a response, and adding that any possible undefined behavior arising through writes to alive objects, as in *f = 1.0f;, would be accounted for by the standard's definition of "object lifetime" (which seem to be trivial for POD types).

HOWEVER: There is plenty of evidence on the internet that above code will produce UB on modern compilers. See here and here for example.
The argumentation in most cases is that the compiler is free to consider &a and f as not aliasing each other and therefore free to reschedule instructions.

The big question now is if such compiler behavior would actually be an "over-interpretation" of the standard.
The only time the standard talks about "aliasing" specifically is in a footnote to 3.10.10 where it makes clear that those are the rules that shall govern aliasing.
As I mentioned earlier, I do not see the any of the above code violating the standard, yet it would be believed illegal by a large number of people (and possibly compiler people).

I would really really appreciate some clarification here.

Small Update:
As member BenVoigt pointed out correctly, int32_t may not align with float on some platforms so the given code may be in violation of the "storage of sufficient alignment and size" rule. I would like to state that int32_t was chosen intentionally to align with float on most platforms and that the assumption for this question is that the types do indeed align.

Small Update #2:
As several members have pointed out, the line int32_t b = a; is probably in violation of the standard, although not with absolute certainty. I agree with that standpoint and, not changing any aspect of the question, ask readers to exclude that line from my statement above that none of the code is in violation of the standard.

回答1:

You're wrong in your third bullet point (and maybe first one too).

You state "The line float g = *f; doesn't break the rules for just the same reason.", where "just the same reason" (a little vague) seems to refer to "accessing through its original type". But that's not what you're doing. You're accessing an int32_t (named a) through an lvalue of type float (obtained from the expression *f). So you're violating the standard.

I also believe (but less sure on this one) that storing a value is an access to (that) stored value, so even *f = 1.0f; violates the rules.



回答2:

I think this statement is incorrect:

The line int32_t b = a; doesn't violate the rules because I am accessing through its original type.

The object that is stored at location &a is now a float, so you are attempting to access the stored value of a float through an lvalue of the wrong type.



回答3:

There are some significant ambiguities in the specification of object lifetime and access, but here are some problems with the code according to my reading of the spec.

float* f = (float*)(&a);

This performs a reinterpret_cast and as long as float does not require stricter alignment than int32_t then you can cast the resulting value back to an int32_t* and you will get the original pointer. Using the result is not otherwise defined in any case.

*f = 1.0f;

Assuming *f aliases with a (and that the storage for an int32_t has the appropriate alignment and size for a float) then the above line ends the lifetime of the int32_t object and places a float object in its place:

The lifetime of an object of type T begins when: storage with the proper alignment and size for type T is obtained, and if the object has non-trivial initialization, its initialization is complete.

The lifetime of an object of type T ends when: [...] the storage which the object occupies is reused or released.

—3.8 Object lifetime [basic.life]/1

We're reusing the storage, but if int32_t has the same size and alignment requirements then it seems like a float always existed in the same place (since the storage was 'obtained'). Perhaps we can avoid this ambiguity by changing this line to new (f) float {1.0f};, so we know that the float object has a lifetime that began at or before the completion of the initialization.

Additionally, 'access' does not necessarily just mean 'read'. It can mean both reads and writes. So the write performed by *f = 1.0f; could be considered 'accessing the stored value' by writing over it, in which case this is also an aliasing violation.

So now assuming that a float object exists and the int32_t object's lifetime has ended:

int32_t b = a;

This code accesses the stored value of a float object through a glvalue with type int32_t, and is clearly an aliasing violation. The program has undefined behavior under 3.10/10.

float g = *f;

Assuming that int32_t has the right alignment and size requirements, and that the pointer f has been obtained in a way that allows its use to be well defined, then this should legally access the float object that was initialized with 1.0f.



回答4:

I've learned the hard way that quoting 6.5.7 from the C99 standard is unhelpful without also looking at 6.5.6. See this answer for the relevant quotes.

6.5.6 makes it clear that the type of an object can, under certain circumstances, change many times during its lifetime. It can take on the type of the value that was most recently written to it. This is really useful.

We need to draw a distinction between "declared type" and "effective type". A local variable, or static global, has a declared type. You are stuck with that type, I think, for the lifetime of that object. You may read from the object using a char *, but the "effective type" doesn't change unfortunately.

But the memory returned by malloc has "no declared type". This will remain true until it is freed. It will never have a declared type, but it's effective type can change according to 6.5.6, always taking on the type of the most recent write.

So, this is legal:

int main() {
    void * vp = malloc(sizeof(int)+sizeof(float)); // it's big enough,
                    //  and malloc will look after alignment for us.
    int32_t *ap = vp;
    *ap = 5;      // make int32_t the 'effective type'
    float* f = vp;
    *f = 1.0f;    // this (legally) changes the effective type.

    // int32_t b = *ap;   // Not defined, because the
                          // effective type is wrong
    float g = *f;    // OK, because the effective type is (currently) correct.
}

So, basically, writing to a malloc-ed space is a valid way to change its type. But I guess that doesn't give us a way to look at the pre-existing through the "lens" of a new type, which might be interesting; it's impossible unless, I think, we use the various char* exceptions to peek at data of the "wrong" type.