This question is mystifying me for years and considering this site's name, this is the place to ask.
Why do we, programmers, still have this StackOverflow
problem?
Why in every major language does the thread stack memory have to be statically allocated on thread creation?
I will speak in the context of C#/Java, because I use them most, but this is probably a broader problem.
Fixed stack size leads to huge problems:
- There is no way to write a recursive algorithm unless you are absolutely sure that the depth of recursion is tiny. Linear memory complexity of the recursive algorithm is often unacceptable.
- There is no cheap way to start new threads. You have to allocate huge block of memory for stack to account for all the possible uses of the thread.
- Even if you don't use very deep recursion, you always have a risk of running out of stack space for the reason that the stack size is an arbitrary fixed number. Considering that the StackOverflow is usually unrecoverable, this is a big problem in my eyes.
Now, if the stack was resized dynamically, all of the problems above would be much alleviated, because stack overflow would only be possible when there is a memory overflow.
But this is not the case yet. Why? Are there some fundamental limitations of modern CPUs which would make it impossible/inefficient? If you think about the performance hit that reallocations would impose, it should be acceptable because people use structures like ArrayList
all the time without suffering much.
So, the question is, am I missing something and the StackOverflow is not a problem, or am I missing something and there are a lot of languages with dynamic stack, or is there some big reason for this being impossible/hard to implement?
Edit: Some people said that performance would be a large problem, but consider this:
- We leave the compiled code untouched. The stack access stays the same, thus the "usual case" performance stays the same.
- We handle CPU exception which happens when the code tries to access the unallocated memory and launch our "reallocation" routine. Reallocations won't be frequent because <put your usual ArrayList argument here>. Should work on most protected-mode CPUs without loss of performance. No?
1) In order to resize stacks, you have to be able to move memory around, meaning that pointers to anything on a stack can become invalid after a stack resize. Yes, you can use another level of indirection to solve this problem, but remember that the stack is used very, very frequently.
2) It significantly makes things more complicated. Push/pop operations on stacks usually work simply by doing some pointer arithmetic on a CPU register. That's why allocation on a stack is faster than allocation on the free-store.
3) Some CPUs (microcontrollers in particular) implement the stack directly on hardware, separate from the main memory.
Also, you can set the size of a stack of a thread when you create a new thread using
beginthread()
, so if you find that the extra stack space is unnecessary, you can set the stack size accordingly.From my experience, stack overflows are usually caused by infinite recursions or recursive functions that allocate huge arrays on the stack. According to MSDN, the default stack size set by the linker is 1MB (the header of executable files can set their own default), which seems to be more than big enough for a majority of cases.
The fixed-stack mechanism works well enough for a majority of applications, so there's no real need to go change it. If it doesn't, you can always roll out your own stack.
I am going to summarize the arguments in the answers so far because I find no answer covering this topic good enough.
Static stack investigation
Motivation
Not everyone needs it.
Implementation difficulties
Dynamic stack implementation turns out to be not as straightforward as it seems.
Existing implementations
There are some languages or runtime libraries that already have the dynamic stack feature or something similar to it.
I would like to see more examples here.
I hope I didn't forget any important pieces of information on this subject. Making this a community wiki so that anyone can add new information.
I can't speak for "major languages". Many "minor" languages do heap-allocated activation records, with each call using a chunk of heap space instead of a linear stack chunk. This allows recursion to go as deep as you have address space to allocate.
Some folks here claim that recursion that deep is wrong, and that using a "big linear stack" is just fine. That isn't right. I'd agree that if you have to use the entire address space, you do a problem of some kind. However, when one has very large graph or tree structures, you want to allow deep recursion and you don't want to guess at how much linear stack space you need first, because you'll guess wrong.
If you decide to go parallel, and you have lots (thousand to million of "grains" [think, small threads]) you can't have 10Mb of stack space allocated to each thread, because you'll be wasting gigabytes of RAM. How on earth could you ever have a million grains? Easy: lots of grains that interlock with one another; when a grain is frozen waiting for a lock, you can't get rid of it, and yet you still want to run other grains to use your available CPUs. This maximizes the amount of available work, and thus allows many physical processors to be used effectively.
The PARLANSE parallel programming language uses this very-large-number of parallel grains model, and heap allocation on function calls. We designed PARLANSE to enable the symbolic analysis and transformation of very large source computer programs (say, several million lines of code). These produce... giant abstract syntax trees, giant control/data flow graphs, giant symbol tables, with tens of millions of nodes. Lots of opportunity for parallel workers.
The heap allocation allows PARLANSE programs to be lexically scoped, even across parallelism boundaries, because one can implement "the stack" as a cactus stack, where forks occur in "the stack" for subgrains, and each grain can consequently see the activation records (parent scopes) of its callers. This makes passing big data structures cheap when recursing; you just reference them lexically.
One might think that heap allocation slows down the program. It does; PARLANSE pays about a 5% penalty in performance but gains the ability to process very large structures in parallel, with as many grains as the address space can hold.
Stack size and allocation is not necessarily related to the language you are using. It is more a question of processor and architecture.
Stack Segments are limited to 4GB on current Intel processors.
This following link is a good read, that may give you some of the answers you seek.
http://www.intel.com/Assets/PDF/manual/253665.pdf - Chapter 6.2
Having practically infinite stack space would be very bad in the case of a infinite recursion because it would turn an easily diagnosed error (stack overflow) into a much more problematic error (out of memory). With a stack overflow, a look at the stack trace will fairly quickly tell you what is going on. Alternately, when the system is out of memory, it may attempt other methods of solving it, such as using swap space, resulting in serious performance degradation.
On the other hand, I have rarely had issues with hitting the stack overflow barrier due to recursion. However, I can think of a couple of circumstance where it happened. However, moving to my own stack implemented as a std::vector was a simple solution to the problem.
Now, what would be neat is if the language would allow me to mark a particular function as "heavily recursive", and then have it operate in its own stack space. That way I'd generally get the advantage of stopping when my recursion is out of whack, but I could still make use of extensive recursion when I wanted to.
Old languages implementations have static stack size, thus most new popular languages (that just copied old languages, and broke/fixed whatever they felt like) have the same issue.
There is no logical reason to have a static stack size unless you are in a formal methods setting. Why introduce faults where the code is correct? Erlang for example doesn't do this, because it handles faults, like any sane partial programming language should do.