In multi-threaded embedded software (written in C or C++), a thread must be given enough stack space in order to allow it to complete its operations without overflowing. Correct sizing of the stack is critical in some real-time embedded environments, because (at least in some systems I've worked with), the operating system will NOT detect this for you.
Usually, the stack size for a new thread (other than the main thread) is designated at the time that thread is created (i.e. in an argument to pthread_create() or the like). Often, these stack sizes are hard-coded to values that are known to be good at the time the code was originally written or tested.
However, future changes to the code often break the assumptions on which the hard-coded stack sizes were based, and one fateful day, your thread enters one of the deeper branches of its call graph and overflows the stack -- bringing down the whole system or silently corrupting memory.
I have personally seen this problem in the case where code executed in the thread declares struct instances on the stack. When the struct is augmented to hold additional data, the stack size inflates accordingly, potentially allowing stack overflows to occur. I imagine this could be a huge problem for established codebases where the full effects of adding fields to a structure cannot be known immediately (too many threads/functions to find all the places where that struct is used).
Since the usual response to "stack sizing" questions is "they're not portable", let's assume that the compiler, operating system, and processor are all known quantities for this investigation. Let's also assume recursion isn't used, so we're not dealing with the possibility an "infinite recursion" scenario.
What are some reliable ways to estimate the necessary stack size for a thread? I'd prefer methods that are offline (static analysis) and automatic, but all ideas are welcome.
This is not an offline method but on the project that I am working on we have a debug command that reads the high water mark on all of the task stacks within the application. This outputs a table of the stack usage for each task and the amount of headroom available. Checking this data after a 24 hour run with lots of user interaction gives us some confidence that the defined stack allocations are "safe".
This works using the well tried technique of filling the stacks with a known pattern and assuming that the only way that this can be re-written is by the normal stack usage, although if it is being written by any other means a stack overflow is the least of your worries!
You can use a static analysis tool like StackAnalyzer, if your target fits the requirements.
As discussed in the answer to this question, a common technique is to initialise the stack with a known value and then to run the code for a while and see where the pattern stops.
We tried to solve this problem on an embedded system at my work. It got crazy, there is just too much code (both our own and 3rd party frameworks) to get any reliable answer. Luckily, our device was Linux based so we fell back to the standard behavior of giving every thread 2mb and letting the virtual memory manager optimize the use.
Our one issue with this solution was one of the 3rd party tools performed an
mlock
on its entire memory space (ideally to improve performance). This caused all 2mb of stack for each thread of its threads (75-150 of them) to be paged in. We lost half of our memory space to until we figured it out and commented out the offending line.Sidenote: Linux's virtual memory manager(vmm) allocates RAM in 4k chunks. When a new thread asks for 2MB of address space for its stack, the vmm assigns bogus memory pages to all but the top most page. When the stack grows into bogus page the kernel detects a page fault and swaps the bogus page with a real one, (which consumes another 4k of actual RAM). This way a thread's stack can grow to any size it needs (as long as it's less than 2mb) and the vmm will ensure only a minimal amount of memory is used.