I have a recursive descent parser for an embedded ARM processor (in C + GCC, for ARM Cortex M3).
While running it I've noticed that it uses a massive amount of stack space (even more than you might expect) and under closer inspection I have found that this is happening:
extern int bar(int *p);
int foo() {
int z = foo(); // it's an example!
int n[100]; // stack usage
return z+bar(n); // calling bar(n) stops n from being optimised out
}
Result of running arm-none-eabi-gcc -fomit-frame-pointer -S test.c
foo:
str lr, [sp, #-4]! ; Push link register
sub sp, sp, #412 ; Reserve space on stack, even if we don't need it now!
bl foo ; Recurse
str r0, [sp, #404] ; Store result
...
So at the start of the function, it pushes the entire stack frame onto the stack. However after a few iterations it's got loads of stuff on the stack that it hasn't used yet.
Ideally, what I'd like is for GCC to generate:
foo:
str lr, [sp, #-4]! ; Push link register
; Don't reserve space, because we don't need it
bl foo ; Recurse
sub sp, sp, #412 ; Reserve space now
str r0, [sp, #404] ; Store result
...
(This is probably not correct but I hope you get the idea)
Something a bit like this can be achieved with the following code, but it's really nasty (and if GCC inlines fooworker, it breaks again!). There must be a better way?
int fooworker(int z) {
int n[100]; // stack usage
return z+bar(n); // calling bar(n) stops n from being optimised out
}
int foo() {
return fooworker(foo());
}
So is there a way of telling GCC to only enlarge the stack at the start of the basic block, or is there a 'barrier' statement that causes extra push/pop ops to be added at that point? I guess GCC is using one of the ARM standard call types - but is there a way to tag these functions with another call type that is a bit more efficient with the stack, or is there a way to rewrite the functions such that the stack is used a bit more sensibly?
Please don't tell me not to use recursion, it is not answering the question.