strict aliasing and alignment

2019-01-13 14:17发布

问题:

I need a safe way to alias between arbitrary POD types, conforming to ISO-C++11 explicitly considering 3.10/10 and 3.11 of n3242 or later. There are a lot of questions about strict aliasing here, most of them regarding C and not C++. I found a "solution" for C which uses unions, probably using this section

union type that includes one of the aforementioned types among its elements or nonstatic data members

From that I built this.

#include <iostream>

template <typename T, typename U>
T& access_as(U* p)
{
    union dummy_union
    {
        U dummy;
        T destination;
    };

    dummy_union* u = (dummy_union*)p;

    return u->destination;
}

struct test
{
    short s;
    int i;
};

int main()
{
    int buf[2];

    static_assert(sizeof(buf) >= sizeof(double), "");
    static_assert(sizeof(buf) >= sizeof(test), "");

    access_as<double>(buf) = 42.1337;
    std::cout << access_as<double>(buf) << '\n';

    access_as<test>(buf).s = 42;
    access_as<test>(buf).i = 1234;

    std::cout << access_as<test>(buf).s << '\n';
    std::cout << access_as<test>(buf).i << '\n';
}

My question is, just to be sure, is this program legal according to the standard?*

It doesn't give any warnings whatsoever and works fine when compiling with MinGW/GCC 4.6.2 using:

g++ -std=c++0x -Wall -Wextra -O3 -fstrict-aliasing -o alias.exe alias.cpp

* Edit: And if not, how could one modify this to be legal?

回答1:

This will never be legal, no matter what kind of contortions you perform with weird casts and unions and whatnot.

The fundamental fact is this: two objects of different type may never alias in memory, with a few special exceptions (see further down).

Example

Consider the following code:

void sum(double& out, float* in, int count) {
    for(int i = 0; i < count; ++i) {
        out += *in++;
    }
}

Let's break that out into local register variables to model actual execution more closely:

void sum(double& out, float* in, int count) {
    for(int i = 0; i < count; ++i) {
        register double out_val = out; // (1)
        register double in_val = *in; // (2)
        register double tmp = out_val + in_val;
        out = tmp; // (3)
        in++;
    }
}

Suppose that (1), (2) and (3) represent a memory read, read and write, respectively, which can be very expensive operations in such a tight inner loop. A reasonable optimization for this loop would be the following:

void sum(double& out, float* in, int count) {
    register double tmp = out; // (1)
    for(int i = 0; i < count; ++i) {
        register double in_val = *in; // (2)
        tmp = tmp + in_val;
        in++;
    }
    out = tmp; // (3)
}

This optimization reduces the number of memory reads needed by half and the number of memory writes to 1. This can have a huge impact on the performance of the code and is a very important optimization for all optimizing C and C++ compilers.

Now, suppose that we don't have strict aliasing. Suppose that a write to an object of any type can affect any other object. Suppose that writing to a double can affect the value of a float somewhere. This makes the above optimization suspect, because it's possible the programmer has in fact intended for out and in to alias so that the sum function's result is more complicated and is affected by the process. Sounds stupid? Even so, the compiler cannot distinguish between "stupid" and "smart" code. The compiler can only distinguish between well-formed and ill-formed code. If we allow free aliasing, then the compiler must be conservative in its optimizations and must perform the extra store (3) in each iteration of the loop.

Hopefully you can see now why no such union or cast trick can possibly be legal. You cannot circumvent fundamental concepts like this by sleight of hand.

Exceptions to strict aliasing

The C and C++ standards make special provision for aliasing any type with char, and with any "related type" which among others includes derived and base types, and members, because being able to use the address of a class member independently is so important. You can find an exhaustive list of these provisions in this answer.

Furthermore, GCC makes special provision for reading from a different member of a union than what was last written to. Note that this kind of conversion-through-union does not in fact allow you to violate aliasing. Only one member of a union is allowed to be active at any one time, so for example, even with GCC the following would be undefined behavior:

union {
    double d;
    float f[2];
};
f[0] = 3.0f;
f[1] = 5.0f;
sum(d, f, 2); // UB: attempt to treat two members of
              // a union as simultaneously active

Workarounds

The only standard way to reinterpret the bits of one object as the bits of an object of some other type is to use an equivalent of memcpy. This makes use of the special provision for aliasing with char objects, in effect allowing you to read and modify the underlying object representation at the byte level. For example, the following is legal, and does not violate strict aliasing rules:

int a[2];
double d;
static_assert(sizeof(a) == sizeof(d));
memcpy(a, &d, sizeof(d));

This is semantically equivalent to the following code:

int a[2];
double d;
static_assert(sizeof(a) == sizeof(d));
for(size_t i = 0; i < sizeof(a); ++i)
   ((char*)a)[i] = ((char*)&d)[i];

GCC makes a provision for reading from an inactive union member, implicitly making it active. From the GCC documentation:

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above will work as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:

int f() {
    union a_union t;
    int* ip;
    t.d = 3.0;
    ip = &t.i;
    return *ip;
}

Similarly, access by taking the address, casting the resulting pointer and dereferencing the result has undefined behavior, even if the cast uses a union type, e.g.:

int f() {
    double d = 3.0;
    return ((union a_union *) &d)->i;
} 

Placement new

(Note: I'm going by memory here as I don't have access to the standard right now). Once you placement-new an object into a storage buffer, the lifetime of the underlying storage objects ends implicitly. This is similar to what happens when you write to a member of a union:

union {
    int i;
    float f;
} u;

// No member of u is active. Neither i nor f refer to an lvalue of any type.
u.i = 5;
// The member u.i is now active, and there exists an lvalue (object)
// of type int with the value 5. No float object exists.
u.f = 5.0f;
// The member u.i is no longer active,
// as its lifetime has ended with the assignment.
// The member u.f is now active, and there exists an lvalue (object)
// of type float with the value 5.0f. No int object exists.

Now, let's look at something similar with placement-new:

#define MAX_(x, y) ((x) > (y) ? (x) : (y))
// new returns suitably aligned memory
char* buffer = new char[MAX_(sizeof(int), sizeof(float))];
// Currently, only char objects exist in the buffer.
new (buffer) int(5);
// An object of type int has been constructed in the memory pointed to by buffer,
// implicitly ending the lifetime of the underlying storage objects.
new (buffer) float(5.0f);
// An object of type int has been constructed in the memory pointed to by buffer,
// implicitly ending the lifetime of the int object that previously occupied the same memory.

This kind of implicit end-of-lifetime can only occur for types with trivial constructors and destructors, for obvious reasons.



回答2:

Aside from the error when sizeof(T) > sizeof(U), the problem there could be, that the union has an appropriate and possibly higher alignment than U, because of T. If you don't instantiate this union, so that its memory block is aligned (and large enough!) and then fetch the member with destination type T, it will break silently in the worst case.

For example, an alignment error occurs, if you do the C-style cast of U*, where U requires 4 bytes alignment, to dummy_union*, where dummy_union requires alignment to 8 bytes, because alignof(T) == 8. After that, you possibly read the union member with type T aligned at 4 instead of 8 bytes.


Alias cast (alignment & size safe reinterpret_cast for PODs only):

This proposal does explicitly violate strict aliasing, but with static assertions:

///@brief Compile time checked reinterpret_cast where destAlign <= srcAlign && destSize <= srcSize
template<typename _TargetPtrType, typename _ArgType>
inline _TargetPtrType alias_cast(_ArgType* const ptr)
{
    //assert argument alignment at runtime in debug builds
    assert(uintptr_t(ptr) % alignof(_ArgType) == 0);

    typedef typename std::tr1::remove_pointer<_TargetPtrType>::type target_type;
    static_assert(std::tr1::is_pointer<_TargetPtrType>::value && std::tr1::is_pod<target_type>::value, "Target type must be a pointer to POD");
    static_assert(std::tr1::is_pod<_ArgType>::value, "Argument must point to POD");
    static_assert(std::tr1::is_const<_ArgType>::value ? std::tr1::is_const<target_type>::value : true, "const argument must be cast to const target type");
    static_assert(alignof(_ArgType) % alignof(target_type) == 0, "Target alignment must be <= source alignment");
    static_assert(sizeof(_ArgType) >= sizeof(target_type), "Target size must be <= source size");

    //reinterpret cast doesn't remove a const qualifier either
    return reinterpret_cast<_TargetPtrType>(ptr);
}

Usage with pointer type argument ( like standard cast operators such as reinterpret_cast ):

int* x = alias_cast<int*>(any_ptr);

Another approach (circumvents alignment and aliasing issues using a temporary union):

template<typename ReturnType, typename ArgType>
inline ReturnType alias_value(const ArgType& x)
{
    //test argument alignment at runtime in debug builds
    assert(uintptr_t(&x) % alignof(ArgType) == 0);

    static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
    static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
    static_assert(std::tr1::is_pod<ArgType>::value, "Argument must be of POD type");

    //assure, that we don't read garbage
    static_assert(sizeof(ReturnType) <= sizeof(ArgType),"Target size must be <= argument size");

    union dummy_union
    {
        ArgType x;
        ReturnType r;
    };

    dummy_union dummy;
    dummy.x = x;

    return dummy.r;
}

Usage:

struct characters
{
    char c[5];
};

//.....

characters chars;

chars.c[0] = 'a';
chars.c[1] = 'b';
chars.c[2] = 'c';
chars.c[3] = 'd';
chars.c[4] = '\0';

int r = alias_value<int>(chars);

The disadvantage of this is, that the union may require more memory than actually needed for the ReturnType


Wrapped memcpy (circumvents alignment and aliasing issues using memcpy):

template<typename ReturnType, typename ArgType>
inline ReturnType alias_value(const ArgType& x)
{
    //assert argument alignment at runtime in debug builds
    assert(uintptr_t(&x) % alignof(ArgType) == 0);

    static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
    static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
    static_assert(std::tr1::is_pod<ArgType>::value, "Argument must be of POD type");

    //assure, that we don't read garbage
    static_assert(sizeof(ReturnType) <= sizeof(ArgType),"Target size must be <= argument size");

    ReturnType r;
    memcpy(&r,&x,sizeof(ReturnType));

    return r;
}

For dynamic sized arrays of any POD type:

template<typename ReturnType, typename ElementType>
ReturnType alias_value(const ElementType* const array,const size_t size)
{
    //assert argument alignment at runtime in debug builds
    assert(uintptr_t(array) % alignof(ElementType) == 0);

    static const size_t min_element_count = (sizeof(ReturnType) / sizeof(ElementType)) + (sizeof(ReturnType) % sizeof(ElementType) != 0 ? 1 : 0);

    static_assert(!std::tr1::is_pointer<ReturnType>::value ? !std::tr1::is_const<ReturnType>::value : true, "Target type can't be a const value type");
    static_assert(std::tr1::is_pod<ReturnType>::value, "Target type must be POD");
    static_assert(std::tr1::is_pod<ElementType>::value, "Array elements must be of POD type");

    //check for minimum element count in array
    if(size < min_element_count)
        throw std::invalid_argument("insufficient array size");

    ReturnType r;
    memcpy(&r,array,sizeof(ReturnType));
    return r;
}

More efficient approaches may do explicit unaligned reads with intrinsics, like the ones from SSE, to extract primitives.


Examples:

struct sample_struct
{
    char c[4];
    int _aligner;
};

int test(void)
{
    const sample_struct constPOD    = {};
    sample_struct pod               = {};
    const char* str                 = "abcd";

    const int* constIntPtr  = alias_cast<const int*>(&constPOD);
    void* voidPtr           = alias_value<void*>(pod);
    int intValue            = alias_value<int>(str,strlen(str));

    return 0;
}

EDITS:

  • Assertions to assure conversion of PODs only, may be improved.
  • Removed superfluous template helpers, now using tr1 traits only
  • Static assertions for clarification and prohibition of const value (non-pointer) return type
  • Runtime assertions for debug builds
  • Added const qualifiers to some function arguments
  • Another type punning function using memcpy
  • Refactoring
  • Small example


回答3:

I think that at the most fundamental level, this is impossible and violates strict aliasing. The only thing you've achieved is tricking the compiler into not noticing.



回答4:

My question is, just to be sure, is this program legal according to the standard?

No. The alignment may be unnatural using the alias you have provided. The union you wrote just moves the point of the alias. It may appear to work, but that program may fail when CPU options, ABI, or compiler settings change.

And if not, how could one modify this to be legal?

Create natural temporary variables and treat your storage as a memory blob (moving in and out of the blob to/from temporaries), or use a union which represents all your types (remember, one active element at a time here).