I'm writing the startup code for an embedded system -- the code that loads the initial stack pointer before jumping to the main() function -- and I need to tell it how many bytes of stack my application will use (or some larger, conservative estimate).
I've been told the gcc compiler now has a -fstack-usage option and -fcallgraph-info option that can somehow be used to statically calculates the exact "Maximum Stack Usage" for me. ( "Compile-time stack requirements analysis with GCC" by Botcazou, Comar, and Hainque ).
Nigel Jones says that recursion is a really bad idea in embedded systems ("Computing your stack size" 2009), so I've been careful not to make any mutually recursive functions in this code.
Also, I make sure that none of my interrupt handlers ever re-enable interrupts until their final return-from-interrupt instruction, so I don't need to worry about re-entrant interrupt handlers.
Without recursion or re-entrant interrupt handlers, it should possible to statically determine the maximum stack usage. (And so most of the answers to How to determine maximum stack usage? do not apply). My understanding is I (or preferably, some bit of code on my PC that is automatically run every time I rebuild the executable) first find the maximum stack depth for each interrupt handler when it's not interrupted by a higher-priority interrupt, and the maximum stack depth of the main() function when it is not interrupted. Then I add them all up to find the total (worst-case) maximum stack depth. That occurs (in my embedded system) when the main() background task is at its maximum depth when it is interrupted by the lowest-priority interrupt, and that interrupt is at its maximum depth when it is interrupted by the next-lowest-priority interrupt, and so on.
I'm using YAGARTO with gcc 4.6.0 to compile code for the LM3S1968 ARM Cortex-M3.
So how do I use the -fstack-usage option and -fcallgraph-info option with gcc to calculate the maximum stack depth? Or is there some better approach to determine maximum stack usage?
(See How to determine maximum stack usage in embedded system? for almost the same question targeted to the Keil compiler .)
I ended up writing a python script to implement τεκ's answer. It's too much code to post here, but can be found on github
GCC docs:
I can't find any references to -fcallgraph-info
You could potentially create the information you need from -fstack-usage and -fdump-tree-optimized
For each leaf in -fdump-tree-optimized, get its parents and sum their stack size number (keeping in mind that this number lies for any function with "dynamic" but not "bounded") from -fstack-usage, find the max of these values and this should be your maximum stack usage.
I am not familiar with the
-fstack-usage
and-fcallgraph-info
options. However, it is always possible to figure out actual stack usage by:0xee
.Just in case no one comes up with a better answer, I'll post what I had in the comment to your other question, even though I have no experience using these options and tools:
GCC 4.6 adds the
-fstack-usage
option which gives the stack usage statistics on a function-by-function basis.If you combine this information with a call graph produced by cflow or a similar tool you can get the kind of stack depth analysis you're looking for (a script could probably be written pretty easily to do this). Have the script read the stack-usage info and load up a map of function names with the stack used by the function. Then have the script walk the
cflow
graph (which can be an easy-to-parse text tree), adding up the stack usage associated with each line for each branch in the call graph.So, it looks like this can be done with GCC, but you might have to cobble together the right set of tools.
Quite late, but for anyone looking at this, the answers given involving combining the outputs from fstack-usage and call graph tools like cflow can end up being wildly incorrect for any dynamic allocation, even bounded, because there's no information about when that dynamic stack allocation occurs. It's therefore not possible to know to what functions you should apply the value towards. As a contrived example, if (simplified) fstack-usage output is:
and a very simple call tree is:
The naive approach to combine these may result in main -> functionA being chosen as the path of maximum stack usage, at 1536 bytes. But, if the largest dynamic stack allocation in main() is to push a large argument like a record to functionB() directly on the stack in a conditional block that calls functionB (I already said this was contrived), then really main -> functionB is the path of maximum stack usage, at 1040 bytes. Depending on existing software design, and also for other more restricted targets that pass everything on the stack, cumulative errors may quickly lead you toward looking at entirely wrong paths claiming significantly overstated maximum stack sizes.
Also, depending on your classification of "reentrant" when talking about interrupts, it's possible to miss some stack allocations entirely. For instance, many Coldfire processors' level 7 interrupt is edge-sensitive and therefore ignores the interrupt disable mask, so if a semaphore is used to leave the instruction early, you may not consider it reentrant, but the initial stack allocation will still happen before the semaphore is checked.
In short, you have to be extremely careful about using this approach.