On most platforms, alloca
just boils down to an inline adjustment of the stack pointer (for example, subtracting from rsp
on x64, plus a bit of logic to maintain stack alignment).
I was looking at the code that gcc
generates for alloca and it is pretty weird. Take the following simple example1:
#include <alloca.h>
#include <stddef.h>
volatile void *psink;
void func(size_t x) {
psink = alloca(x);
}
This compiles to the following assembly at -O2
:
func(unsigned long):
push rbp
add rdi, 30
and rdi, -16
mov rbp, rsp
sub rsp, rdi
lea rax, [rsp+15]
and rax, -16
mov QWORD PTR psink[rip], rax
leave
ret
There are several confusing things here. I understand that gcc
needs to round the allocated size up to a multiple of 16 (to maintain stack alignment), and the usual way to do that would be (size + 15) & ~0xF
but instead it adds 30 at add rdi, 30
? What's up with that?
Second, I would just expect the result of alloca
to be the new rsp
value, which is already well-aligned. Instead, gcc does this:
lea rax, [rsp+15]
and rax, -16
Which seems to be "realigning" the value of rsp
to use as the result of alloca
- but we already did the work to align rsp
to a 16-byte boundary in the first place.
What's up with that?
You can play with the code on godbolt. It is worth noting that clang
and icc
do the "expected thing" on x86 at least. With VLAs (as suggested in earlier comments), gcc
and clang
does fine while icc
produces an abomination.
1 Here, the assignment to psink
is just to consume the result of alloca
since otherwise the compiler just omits it entirely.
This is a very old, normal priority bug. The code works correctly. It's just that when the size is larger than 1 byte, 16 more bytes are unnecessarily allocated. So it's not a correctness bug, it's a minor efficiency bug.