Strict-aliasing and pointer to union fields

2019-05-06 15:06发布

问题:

I've got a question about strict-aliasing rules, unions and standard. Assume we have the following code:

#include <stdio.h>

union
{
    int f1;
    short f2;
} u = {0x1};

int     * a = &u.f1;
short   * b = &u.f2;

int main()
{
    u.f1 = 1;
    *a += 1;
    u.f2 = 2;
    *b *= 2;

    printf( "%d %hd\n", *a, *b);

    return 0;
}

Now let's look how it works:

$ gcc-5.1.0-x86_64 t.c -O3 -Wall && ./a.out 
2 4
$ gcc-5.1.0-x86_64 t.c -O3 -Wall -fno-strict-aliasing && ./a.out 
4 4

We can see that strict-aliasing breaks dependencies. Moreover it seems to be a correct code without breaking strict-aliasing rule.

  1. Does it turn out than in case of union fields an object laying at the address is compatible with all types of union members?
  2. If 1 is true what should compiler do with pointers to union members? Is it a problem in the standard, that allows such compiler behavior? If not - why?
  3. Generally speaking different behavior of the compiler with the correct code is inadmissible in any case. So it seems to be a compiler bug too (especially if taking address to union field will be inside functions, the SA does not breaks dependence).

回答1:

The C standard says that aliasing via unions is explicitly permitted.

However check the following code:

void func(int *a, short *b)
{
     *a = 1; 
     printf("%f\n", *b);
}

The intent of the strict aliasing rule is that a and b should be assumed to not alias. However you could call func(&u.f1, &u.f2); .

To resolve this dilemma, a common sense solution is to say that the 'bypass permit' that unions have to avoid the strict aliasing rule only applies to when the union members are accessed by name.

The Standard doesn't explicitly state this. It could be argued that "If the member used..." (6.5.2.3) actually is specifying that the 'bypass' only occurs when accessing the member by name, but it's not 100% clear.

However it is hard to come up with any alternative and self-consistent interpretation. One possible alternative interpretation goes along the lines that writing func(&u.f1, &u.f2) causes UB because overlapping objects were passed to a function that 'knows' it does not receive overlapping objects -- sort of like a restrict violation.

If we apply this first interpretation to your example, we would say that the *a in your printf causes UB because the current object stored at that location is a short, and 6.5.2.3 doesn't kick in because we are not using the union member by name.

I'd guess based on your posted results that gcc is using the same interpretation.

This has been discussed before here but I can't find the thread right now.



回答2:

The C99 Technical Corrigendum 3 is clarifying about the type-punning based on the union method by stating in the section 6.5.2.3:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning").

See here from 1042 through 1044