SSE and C++ containers

2019-01-24 06:38发布

Is there an obvious reason why the following code segfaults ?

#include <vector>
#include <emmintrin.h>

struct point {
    __m128i v;

  point() {
    v = _mm_setr_epi32(0, 0, 0, 0);
  }
};

int main(int argc, char *argv[])
{
  std::vector<point> a(3);
}

Thanks

Edit: I'm using g++ 4.5.0 on linux/i686, I might not know what I'm doing here, but since even the following segfaults

int main(int argc, char *argv[])
{
  point *p = new point();
}

I really think it must be and alignment issue.

4条回答
Ridiculous、
2楼-- · 2019-01-24 07:25

The obvious thing that could have gone wrong would be if v wasn't aligned properly.

But it's allocated dynamically by vector, so it isn't subject to stack misalignment issues.

However, as phooji correctly points out, a "template" or "prototype" value is passed to the std::vector constructor which will be copied to all the elements of the vector. It's this parameter of std::vector::vector that will be placed on the stack and may be misaligned.

Some compilers have a pragma for controlling stack alignment within a function (basically, the compiler wastes some extra space as needed to get all locals properly aligned).

According to the Microsoft documentation, Visual C++ 2010 should set up 8 byte stack alignment automatically for SSE types and has done so since Visual C++ 2003

For gcc I don't know.


Under C++0x, for new point() to return unaligned storage is a serious non-compliance. [basic.stc.dynamic.allocation] says (wording from draft n3225):

The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. There are no constraints on the contents of the allocated storage on return from the allocation function. The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function).

And [basic.align] says:

Additionally, a request for runtime allocation of dynamic storage for which the requested alignment cannot be honored shall be treated as an allocation failure.

Can you try a newer version of gcc where this might be fixed?

查看更多
干净又极端
3楼-- · 2019-01-24 07:30

The vector constructor you are using is actually defined like this:

explicit vector ( size_type n, const T& value= T(), const Allocator& = Allocator() );

(see e.g., http://www.cplusplus.com/reference/stl/vector/vector/).

In other words, one element is default constructed (i.e., the default parameter value as you call the constructor), and the remaining elements are then created by copying the first one. My guess is that you need a copy constructor for point that properly handles the (non-)copying of __m128i values.

Update: When I try to build your code with Visual Studio 2010 (v. 10.0.30319.1), I get the following build error:

error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned c:\program files\microsoft visual studio 10.0\vc\include\vector 870 1   meh

This suggests Ben is right on the money regarding this being an alignment problem.

查看更多
家丑人穷心不美
4楼-- · 2019-01-24 07:39

SSE intrinsics are required to be 16-byte aligned in memory. When you allocate an __m128 on the stack, there's no problem because the compiler automatically aligns these correctly. The default allocator for std::vector<>, which handles dynamic memory allocation, does not produce aligned allocations.

查看更多
ゆ 、 Hurt°
5楼-- · 2019-01-24 07:40

There is a possibility that the memory that is allocated by the default allocator in your compiler's STL implementation is not aligned. This will be dependent on the specific platform and compiler vendor.

Usually the default allocator uses operator new, which usually does not guarantee alignment beyond the word size (32-bit or 64-bit). To solve the problem, it may be necessary to implement a custom allocator which uses _aligned_malloc.

Also, a simple fix (although not a satisfactory one) would be to assign the value to a local __m128i variable, then copy that variable to the struct using unaligned instruction. Example:

struct point {
    __m128i v;
    point() {
        __m128i temp = _mm_setr_epi32(0, 0, 0, 0);
        _mm_storeu_si128(&v, temp);
    }
};
查看更多
登录 后发表回答