I am using DirectXMath
, creating XMMatrix
and XMVector
in classes.
When I call XMMatrixMultiply
it throws unhandled exception on it.
I have found online that it is an issue with byte alligment, since DirectXMath
uses SIMD
instructions set which results in missaligned heap allocation.
One of the proposed solution was to use XMFLOAT4X4
variables and then change them to temporary XMMatrix
whenever needed, but it isn't the nicest and fastest solution imo.
Another one was to use _aligned_malloc
, yet I have no idea whatsoever how to use it. I have never had to do any memory allocations and it is black magic for me.
Another one, was to overload new operator
, yet they did not provide any information how to do it.
And regarding the overloading method, I'm not using new
to create XMMatrix
objects since I don't use them as pointers.
It was all working nice untill I have decided to split code into classes.
I think _alligned_malloc
solution would be best here, but I have no idea how to use it, where and when to call it.
Unlike XMFLOAT4X4 and XMFLOAT4, which are safe to store, XMMATRIX and XMVECTOR are aliases for hardware registers (SSE, NEON, etc.). Since the library is abstracting away the register type and alignment requirements, you shouldn't attempt to align the types yourself, since you can easily create a program that happens to work on your machine but fails on another. You should either use the safe types for storage (e.g. XMFLOAT4) or pull up the abstraction and use the vector instructions directly, with special storage and alignment code paths in your application for each vector extension you're trying to support.
Also, using these registers outside of the context of the library's vector instructions might cause unexpected failures for other reasons. For example, if you store an XMMATRIX in your own struct, some architectures might fail to create copies of the struct.
Not pretend to be a complete answer.
There are some ways that you didn't mention:
#define _XM_NO_INTRINSICS_
. Simple. Slow. Works right now, just one line of code. ;)
- Don't store
XMVECTOR
and XMMATRIX
on a heap. Store XMFLOAT4
or XMFLOAT4X4
and convert to SIMD types only when needed (so they will be stored on stack). Slower. Many code to change (probably).
- Don't store
XMVECTOR
and XMMATRIX
on a heap, part 2. Just store your classes on stack. Fast. Pretty hard. Many code to change (probably).
- Use aligned allocator. Fast. Hard. Many hours to google, many code to write and debug.
- Don't use DirectXMath (previously XMMath) library. Choose any other (there are plenty) or write your own. Fast. Many code to change (probably).
If you want aligned allocator, it has nothing to DirectX or DirectXMath. It is advanced topic. No one can give you complete solution. But, here are some results of googling:
- returning aligned memory with new?
- Harder to C++: Aligned Memory Allocation
- many more
Be very attentive. With bad memory allocator you can introduce much more problems than solve.
Hope it helps somehow. Happy Coding! :)