16 byte alignment issue

2019-08-03 05:54发布

问题:

I am using DirectXMath, creating XMMatrix and XMVector in classes.

When I call XMMatrixMultiply it throws unhandled exception on it.

I have found online that it is an issue with byte alligment, since DirectXMath uses SIMD instructions set which results in missaligned heap allocation.

One of the proposed solution was to use XMFLOAT4X4 variables and then change them to temporary XMMatrix whenever needed, but it isn't the nicest and fastest solution imo.

Another one was to use _aligned_malloc, yet I have no idea whatsoever how to use it. I have never had to do any memory allocations and it is black magic for me.

Another one, was to overload new operator, yet they did not provide any information how to do it.

And regarding the overloading method, I'm not using new to create XMMatrix objects since I don't use them as pointers.

It was all working nice untill I have decided to split code into classes.

I think _alligned_malloc solution would be best here, but I have no idea how to use it, where and when to call it.

回答1:

Unlike XMFLOAT4X4 and XMFLOAT4, which are safe to store, XMMATRIX and XMVECTOR are aliases for hardware registers (SSE, NEON, etc.). Since the library is abstracting away the register type and alignment requirements, you shouldn't attempt to align the types yourself, since you can easily create a program that happens to work on your machine but fails on another. You should either use the safe types for storage (e.g. XMFLOAT4) or pull up the abstraction and use the vector instructions directly, with special storage and alignment code paths in your application for each vector extension you're trying to support.

Also, using these registers outside of the context of the library's vector instructions might cause unexpected failures for other reasons. For example, if you store an XMMATRIX in your own struct, some architectures might fail to create copies of the struct.



回答2:

Not pretend to be a complete answer.

There are some ways that you didn't mention:

  • #define _XM_NO_INTRINSICS_. Simple. Slow. Works right now, just one line of code. ;)
  • Don't store XMVECTOR and XMMATRIX on a heap. Store XMFLOAT4 or XMFLOAT4X4 and convert to SIMD types only when needed (so they will be stored on stack). Slower. Many code to change (probably).
  • Don't store XMVECTOR and XMMATRIX on a heap, part 2. Just store your classes on stack. Fast. Pretty hard. Many code to change (probably).
  • Use aligned allocator. Fast. Hard. Many hours to google, many code to write and debug.
  • Don't use DirectXMath (previously XMMath) library. Choose any other (there are plenty) or write your own. Fast. Many code to change (probably).

If you want aligned allocator, it has nothing to DirectX or DirectXMath. It is advanced topic. No one can give you complete solution. But, here are some results of googling:

  • returning aligned memory with new?
  • Harder to C++: Aligned Memory Allocation
  • many more

Be very attentive. With bad memory allocator you can introduce much more problems than solve.

Hope it helps somehow. Happy Coding! :)