I try to build an application which uses pthreads and __m128 SSE type. According to GCC manual, default stack alignment is 16 bytes. In order to use __m128, the requirement is the 16-byte alignment.
My target CPU supports SSE. I use a GCC compiler which doesn't support runtime stack realignment (e.g. -mstackrealign). I cannot use any other GCC compiler version.
My test application looks like:
#include <xmmintrin.h>
#include <pthread.h>
void *f(void *x){
__m128 y;
...
}
int main(void){
pthread_t p;
pthread_create(&p, NULL, f, NULL);
}
The application generates an exception and exits. After a simple debugging (printf "%p", &y), I found that the variable y is not 16-byte aligned.
My question is: how can I realign the stack properly (16-byte) without using any GCC flags and attributes (they don't help)? Should I use GCC inline Assembler within this thread function f()?
Allocate on the stack an array that is 15-bytes larger than sizeof(__m128)
, and use the first aligned address in that array. If you need several, allocate them in an array with a single 15-byte margin for alignment.
I do not remember if allocating an unsigned char
array makes you safe from strict aliasing optimizations by the compiler or if it only works only the other way round.
#include <stdint.h>
void *f(void *x)
{
unsigned char y[sizeof(__m128)+15];
__m128 *py = (__m128*) (((uintptr_t)&y) + 15) & ~(uintptr_t)15);
...
}
This shouldn't be happening in the first place, but to work around the problem you can try:
void *f(void *x)
{
__m128 y __attribute__ ((aligned (16)));
...
}
Another solution would be, to use a padding function, which first aligns the stack and then calls f
. So instead of calling f
directly, you call pad
, which pads the stack first and then calls foo
with an aligned stack.
The code would look like this:
#include <xmmintrin.h>
#include <pthread.h>
#define ALIGNMENT 16
void *f(void *x) {
__m128 y;
// other stuff
}
void * pad(void *val) {
unsigned int x; // to get the current address from the stack
unsigned char pad[ALIGNMENT - ((unsigned int) &x) % ALIGNMENT];
return f(val);
}
int main(void){
pthread_t p;
pthread_create(&p, NULL, pad, NULL);
}
Sorry to resurrect an old thread...
For those with a newer compiler than OP, OP mentions a -mstackrealign
option, which lead me to __attribute__((force_align_arg_pointer))
. If your function is being optimized to use SSE, but %ebp
is misaligned, this will do the runtime fixes if required for you, transparently. I also found out that this is only an issue on i386
. The x86_64
ABI guarantees the arguments are aligned to 16 bytes.
__attribute__((force_align_arg_pointer))
void i_crash_when_not_aligned_to_16_bytes() {
...
}
Cool article for those who might want to learn more: http://wiki.osdev.org/System_V_ABI
I have solved this problem.
Here is my solution:
void another_function(){
__m128 y;
...
}
void *f(void *x){
asm("pushl %esp");
asm("subl $16,%esp");
asm("andl $-0x10,%esp");
another_function();
asm("popl %esp");
}
First, we increase the stack by 16 bytes. Second, we make least-significant nibble equal 0x0. We preserve the stack pointer using push/pop operands. We call another function, which has all its own local variables 16-byte aligned. All nested functions will also have their local variables 16-byte aligned.
And It works!