What's up with gcc's handling of alloca?

On most platforms, alloca just boils down to an inline adjustment of the stack pointer (for example, subtracting from rsp on x64, plus a bit of logic to maintain stack alignment).

I was looking at the code that gcc generates for alloca and it is pretty weird. Take the following simple example¹:

#include <alloca.h>
#include <stddef.h>

volatile void *psink;

void func(size_t x) {
  psink = alloca(x);
}

This compiles to the following assembly at -O2:

func(unsigned long):
        push    rbp
        add     rdi, 30
        and     rdi, -16
        mov     rbp, rsp
        sub     rsp, rdi
        lea     rax, [rsp+15]
        and     rax, -16
        mov     QWORD PTR psink[rip], rax
        leave
        ret

There are several confusing things here. I understand that gcc needs to round the allocated size up to a multiple of 16 (to maintain stack alignment), and the usual way to do that would be (size + 15) & ~0xF but instead it adds 30 at add rdi, 30? What's up with that?

Second, I would just expect the result of alloca to be the new rsp value, which is already well-aligned. Instead, gcc does this:

    lea     rax, [rsp+15]
    and     rax, -16

Which seems to be "realigning" the value of rsp to use as the result of alloca - but we already did the work to align rsp to a 16-byte boundary in the first place.

What's up with that?

You can play with the code on godbolt. It is worth noting that clang and icc do the "expected thing" on x86 at least. With VLAs (as suggested in earlier comments), gcc and clang does fine while icc produces an abomination.

1 Here, the assignment to psink is just to consume the result of alloca since otherwise the compiler just omits it entirely.

This is a very old, normal priority bug. The code works correctly. It's just that when the size is larger than 1 byte, 16 more bytes are unnecessarily allocated. So it's not a correctness bug, it's a minor efficiency bug.