SIMD and dynamic memory allocation [duplicate]

2019-02-18 09:55发布

问题:

Possible Duplicate:
SSE, intrinsics, and alignment

I'm new to SIMD programming, so please excuse me if I'm asking an obvious question.

I was experimenting a bit and got to a point where I want to store a SIMD value in a dynamically allocated structure.

Here's the code:

struct SimdTest
{
    __m128      m_simdVal;

    void setZero()
    {
        __m128 tmp = _mm_setzero_ps(); 
        m_simdVal = tmp; // <<--- CRASH ---
    }
};

TEST( Plane, dynamicallyAllocatedPlane )
{
    SimdTest* test = new SimdTest();

    test->setZero();

    delete test;
}

When the method marked with CRASH comment is executed, the code crashes with the following exception:

Unhandled exception at 0x775315de in test-core.exe: 0xC0000005: Access violation reading location 0x00000000

Could someone please explain why does the assignment operation break, and how should SIMD-containing objects be allocated dynamically so that they work fine?

I need to add that if I statically instantiate a SimdTest object and call the setZero method, everything works fine.

Thanks, Paksas

回答1:

It dies because the structure is mis-aligned. The CRT allocator only promises alignment to 8, 16 is required here. You'll need to use _aligned_malloc() on MSVC to get properly aligned heap allocated memory.

Two ways to go about it. Since this is a POD struct, you could just cast:

#include <malloc.h>
...
    SimdTest* test = (SimdTest*)_aligned_malloc(sizeof SimdTest, 16);
    test->setZero();
    _aligned_free(test);

Or you could override the new/delete operators for the struct:

struct SimdTest
{
    void* operator new(size_t size) { return _aligned_malloc(size, 16); }
    void operator delete(void* mem) { return _aligned_free(mem); }
    // etc..
};


回答2:

MSDN states that the _m128 are automaticly aligned by 16 bytes, not __m128, but _m128. But anyway i guess the others right, as i recall there are two kind of move instructions, one for aligned movAps and one for unaligned - movUps. First requires 16b aligment and other don't. Don't know if compiler are capable of using both, but i'd tryed this _m128 type.

Actually there are special type for that: _M128A.