As far as I can see, stack memory is contiguous in virtual memory address, but stack memory is also contiguous physically? And does this have something to do with the stack size limit?
Edit:
I used to believe that stack memory doesn't has to be contiguous physically, but why do we think that stack memory is always quicker than heap memory? If it's not physically contiguous, how can stack take more advantage of cache? And there is another thing that always confuse me, cpu executes directives in data segment, which is not near the stack segment in virtual memory, I don't think the operating system will make stack segment and data segment close to each other physically, so this might do harm to the cache effect, what do you think?
Edit again:
Maybe I should give an example to express myself better, if we want to sort a large amount of numbers, using array to store the numbers is better than using a list, because every list node may be constructed by malloc
, so it may not take good advantage of cache, that's why I say stack memory is quicker than heap memory.
As far as I can see, stack memory is contiguous in virtual memory
address, but stack memory is also contiguous physically? And does this
have something to do with the stack size limit?
No, stack memory is not necessarily contiguous in the physical address space. It's not related to the stack size limit. It's related to how the OS manages memory. The OS only allocates a physical page when the corresponding virtual page is accessed for the first time (or for the first time since it got paged out to the disk). This is called demand-paging, and it helps conserve memory usage.
why do we think that stack memory is always quicker
than heap memory? If it's not physically contiguous, how can stack
take more advantage of cache?
It has nothing to do with the cache. It's just faster to allocate and deallocate memory from the stack than the heap. That's because allocating and deallocating from the stack takes only a single instruction (incrementing or decrementing the stack pointer). On the other hand, there is a lot more work involved into allocating and/or deallocating memory from the heap. See this article for more information.
Now once memory allocated (from the heap or stack), the time it takes to access that allocated memory region does not depend on whether it's stack or heap memory. It depends on the memory access behavior and whether it's friendly to the cache and memory architecture.
if we want to sort a large amount of numbers, using array to store the
numbers is better than using a list, because every list node may be
constructed by malloc, so it may not take good advantage of cache,
that's why I say stack memory is quicker than heap memory.
Using an array is faster not because arrays are allocated from the stack. Arrays can be allocated from any memory (stack, heap, or anywhere). It's faster because arrays are usually accessed contiguously one element at a time. When the first element is accessed, a whole cache line that contains the element and other elements is fetched from memory to the L1 cache. So accessing the other elements in that cache line can be done very efficiently, but accessing the first element in the cache line is still slow (unless the cache line was prefetched). This is the key part: since cache lines are 64-byte aligned and both virtual and physical pages are 64-byte aligned as well, then it's guaranteed that any cache line fully resides within a single virtual page and a single physical page. This what makes fetching cache lines efficient. Again, all of this has nothing to do with whether the array was allocated from the stack or heap. It holds true either way.
On the other hand, since the elements of a linked list are typically not contiguous (not even in the virtual address space), then a cache line that contains an element may not contain any other elements. So fetching every single element can be more expensive.
No, there is no promise of contiguity of physical addresses. But it doesn't matter, because user-space programs don't use physical addresses, so have no idea that this is the case.
Memory is memory. Stack memory is no faster than heap memory and is no slower. It is all the same. The only thing that makes a memory a stack or a heap is how it is allocated by the application. It is entirely possible to allocate a memory on the heap and make that the program stack.
The speed difference is in the allocation. Stack memory is allocated by subtracting from the stack pointer: one instruction.
The process of allocating heap depends upon the heap manager but it is much more complex and may requiring mapping pages to the address space.
It is a complex topic.
Heap and stack have (usually) the same memory and memory type (MTRR, cache setting per page, etc.). [mmap, files, drivers could have different strategies, or when user explicit change it].
Stack could be faster, because it is often used. When you call a function, parameters and local variables are put into stack, so the cache is fresh. Additionally, because functions call and return often, probably there is some more stack in the other cache level, and seldom the top of stack is paged (because it was used recently).
So cache could be faster, but just if you have few variables. If you allow large arrays on stack e.g. with alloca
, the advantage disappear.
In general, this is a very complex topic, and it is better not to optimize too much, because it could cause complex code, so more difficult to refactor and high level optimization of code. (e.g. on multi-dimentional arrays, the order of indices (and so memory) and loops could improve sensible the speed, but also quickly the code will be impossible to maintain).