Is there an obvious reason why the following code segfaults ?
#include <vector>
#include <emmintrin.h>
struct point {
__m128i v;
point() {
v = _mm_setr_epi32(0, 0, 0, 0);
}
};
int main(int argc, char *argv[])
{
std::vector<point> a(3);
}
Thanks
Edit: I'm using g++ 4.5.0 on linux/i686, I might not know what I'm doing here, but since even the following segfaults
int main(int argc, char *argv[])
{
point *p = new point();
}
I really think it must be and alignment issue.
The obvious thing that could have gone wrong would be if
v
wasn't aligned properly.But it's allocated dynamically by
vector
, so it isn't subject to stack misalignment issues.However, as phooji correctly points out, a "template" or "prototype" value is passed to the
std::vector
constructor which will be copied to all the elements of the vector. It's this parameter ofstd::vector::vector
that will be placed on the stack and may be misaligned.Some compilers have a pragma for controlling stack alignment within a function (basically, the compiler wastes some extra space as needed to get all locals properly aligned).
According to the Microsoft documentation, Visual C++ 2010 should set up 8 byte stack alignment automatically for SSE types and has done so since Visual C++ 2003
For gcc I don't know.
Under C++0x, for
new point()
to return unaligned storage is a serious non-compliance.[basic.stc.dynamic.allocation]
says (wording from draft n3225):And
[basic.align]
says:Can you try a newer version of gcc where this might be fixed?
The
vector
constructor you are using is actually defined like this:(see e.g., http://www.cplusplus.com/reference/stl/vector/vector/).
In other words, one element is default constructed (i.e., the default parameter value as you call the constructor), and the remaining elements are then created by copying the first one. My guess is that you need a copy constructor for
point
that properly handles the (non-)copying of__m128i
values.Update: When I try to build your code with Visual Studio 2010 (v. 10.0.30319.1), I get the following build error:
This suggests Ben is right on the money regarding this being an alignment problem.
SSE intrinsics are required to be 16-byte aligned in memory. When you allocate an
__m128
on the stack, there's no problem because the compiler automatically aligns these correctly. The default allocator forstd::vector<>
, which handles dynamic memory allocation, does not produce aligned allocations.There is a possibility that the memory that is allocated by the default allocator in your compiler's STL implementation is not aligned. This will be dependent on the specific platform and compiler vendor.
Usually the default allocator uses operator
new
, which usually does not guarantee alignment beyond the word size (32-bit or 64-bit). To solve the problem, it may be necessary to implement a custom allocator which uses_aligned_malloc
.Also, a simple fix (although not a satisfactory one) would be to assign the value to a local
__m128i
variable, then copy that variable to the struct using unaligned instruction. Example: