I try to build an application which uses pthreads and __m128 SSE type. According to GCC manual, default stack alignment is 16 bytes. In order to use __m128, the requirement is the 16-byte alignment.
My target CPU supports SSE. I use a GCC compiler which doesn't support runtime stack realignment (e.g. -mstackrealign). I cannot use any other GCC compiler version.
My test application looks like:
#include <xmmintrin.h>
#include <pthread.h>
void *f(void *x){
__m128 y;
...
}
int main(void){
pthread_t p;
pthread_create(&p, NULL, f, NULL);
}
The application generates an exception and exits. After a simple debugging (printf "%p", &y), I found that the variable y is not 16-byte aligned.
My question is: how can I realign the stack properly (16-byte) without using any GCC flags and attributes (they don't help)? Should I use GCC inline Assembler within this thread function f()?
Another solution would be, to use a padding function, which first aligns the stack and then calls
f
. So instead of callingf
directly, you callpad
, which pads the stack first and then callsfoo
with an aligned stack.The code would look like this:
This shouldn't be happening in the first place, but to work around the problem you can try:
I have solved this problem. Here is my solution:
First, we increase the stack by 16 bytes. Second, we make least-significant nibble equal 0x0. We preserve the stack pointer using push/pop operands. We call another function, which has all its own local variables 16-byte aligned. All nested functions will also have their local variables 16-byte aligned.
And It works!
Sorry to resurrect an old thread...
For those with a newer compiler than OP, OP mentions a
-mstackrealign
option, which lead me to__attribute__((force_align_arg_pointer))
. If your function is being optimized to use SSE, but%ebp
is misaligned, this will do the runtime fixes if required for you, transparently. I also found out that this is only an issue oni386
. Thex86_64
ABI guarantees the arguments are aligned to 16 bytes.__attribute__((force_align_arg_pointer)) void i_crash_when_not_aligned_to_16_bytes() { ... }
Cool article for those who might want to learn more: http://wiki.osdev.org/System_V_ABI
Allocate on the stack an array that is 15-bytes larger than
sizeof(__m128)
, and use the first aligned address in that array. If you need several, allocate them in an array with a single 15-byte margin for alignment.I do not remember if allocating an
unsigned char
array makes you safe from strict aliasing optimizations by the compiler or if it only works only the other way round.