I'm currently trying to understand how the stack works, so I've decided teach myself some assembly language, I'm using this book:
http://savannah.nongnu.org/projects/pgubook/
I'm using Gas and doing my development on Linux Mint.
I'm a bit confused by something:
As far as I was aware a stack is simply a data structure. So I assumed if I was coding in assembly I'd have to implement the stack myself. However this doesn't seem to be the case as there are commands like
pushl
popl
So when coding in assembly for the x86 architecture and using the Gas syntax: is the stack just a data structure that's already implemented? Or is it actually implemented at the hardware level? Or is it something else? Also would most assembly languages for other chip sets have the stack already implemented?
I know this is a bit of a foolish question but I'm actually quite confused by this.
stack
is part of memory. it use forinput
andoutput
offunctions
. also it use for remembering function's return.esp
register is remember the stack address.stack
andesp
are implemented by hardware. also you can implement it yourself. it will make your program very slow.example:
nop //
esp
= 0012ffc4push 0 //
esp
= 0012ffc0 ,Dword[0012ffc0]=00000000call proc01 //
esp
= 0012ffbc ,Dword[0012ffbc] =eip
,eip
= adrr[proc01]pop
eax
//eax
= Dword[esp
],esp
=esp
+ 4Calling functions, which requires saving and restoring local state in LIFO fashion (as opposed to say, a generalized co-routine approach), turns out to be such an incredibly common need that assembly languages and CPU architectures basically build this functionality in. The same could probably be said for notions of threading, memory protection, security levels, etc. In theory you could implement your own stack, calling conventions, etc., but I assume some opcodes and most existing runtimes rely on this native concept of "stack".
Regarding whether the stack is implemented in the hardware, this Wikipedia article might help.
That article and the others it links to might be useful to get a feel for stack usage in processors.
I think that main answer you are looking for has already been hinted at.
When an x86 computer boots up, the stack is not setup. The programmer must explicitly set it up at boot time. However, if you are already in an operating system, this has been taken care of. Below is a code sample from a simple bootstrap program.
First the data and stack segment registers are set, and then the stack pointer is set 0x4000 beyond that.
After this code the stack may be used. Now I am sure it can be done in a number of different ways, but I think this should illustrate the idea.
I think primarily you're getting confused between a
program's stack
andany old stack
.A Stack
Is an abstract data structure which consists of information in a Last In First Out system. You put arbitrary objects onto the stack and then you take them off again, much like an in/out tray, the top item is always the one that is taken off and you always put on to the top.
A Programs Stack
Is a stack, it's a section of memory that is used during execution, it generally has a static size per program and frequently used to store function parameters. You push the parameters onto the stack when you call a function and the function either address the stack directly or pops off the variables from the stack.
A programs stack isn't generally hardware (though it's kept in memory so it can be argued as such), but the Stack Pointer which points to a current area of the Stack is generally a CPU register. This makes it a bit more flexible than a LIFO stack as you can change the point at which the stack is addressing.
You should read and make sure you understand the wikipedia article as it gives a good description of the Hardware Stack which is what you are dealing with.
There is also this tutorial which explains the stack in terms of the old 16bit registers but could be helpful and another one specifically about the stack.
From Nils Pipenbrinck:
It's worthy of note that some processors do not implement all of the instructions for accessing and manipulating the stack (push, pop, stack pointer, etc) but the x86 does because of it's frequency of use. In these situations if you wanted a stack you would have to implement it yourself (some MIPS and some ARM processors are created without stacks).
For example, in MIPs a push instruction would be implemented like:
and a Pop instruction would look like:
(I've made a gist of all the code in this answer in case you want to play with it)
I have only ever did most basic things in asm during my CS101 course back in 2003. And I had never really "got it" how asm and stack work until I've realized that it's all basicaly like programming in C or C++ ... but without local variables, parameters and functions. Probably doesn't sound easy yet :) Let me show you (for x86 asm with Intel syntax).
1. What is the stack
Stack is a contiguous chunk of memory allocated for every thread when it starts. You can store there whatever you want. In C++ parlance (code snippet #1):
2. Stack's top and bottom
In principle, you could store values in random cells of
stack
array (snippet #2.1):But imagine how hard would it be to remember which cells of
stack
are already in use and wich ones are "free". That's why we store new values on the stack next to each other.One weird thing about (x86) asm's stack is that you add things there starting with the last index and move to lower indexes: stack[999], then stack[998] and so on (snippet #2.2):
And still (coution, you're gonna be confused now) the "official" name for
stack[999]
is bottom of the stack.The last used cell (
stack[997]
in the example above) is called top of the stack (see Where the top of the stack is on x86).3. Stack pointer (SP)
Stack is not the only thing visible everywhere in your asm code. You can also manipulate CPU registers (see General-Purpose Registers). They are really like global variables:
There is dedicated CPU register (SP) to keep track of the last element added to the stack. As name suggests it's, well, a pointer (holds a memory address like 0xAAAABBCC). But for the purposes of this post I'll use it as an index.
At thread's start
SP == STACK_CAPACITY
and then you decrement it as a need arises. The rule is you can't write to stack cells beyond stack's top and any index less then SP is invalid, so you first decrement SP and then write a value to the newly allocated cell.When you know you will add several values on the stack in a row, you can reserve space for all of them upfront (snippet #3):
Note. Now you can see why "allocating" on the stack is so fast. You don't actually allocate anything (as in
new
keyword ormalloc
), it's just a single integer decrement.4. Getting rid of local variables
Let's take this simplistic function (snippet #4.1):
and rewrite it without local variable (snippet #4.2):
usage (snippet #4.3):
5. Push / pop
Addition of a new element on the top of the stack is such a frequent operation, that CPUs have a special instruction for that,
push
. We'll implent it like this (snippet 5.1):Likewise, taking the top element of the stack (snippet 5.2):
Common usage pattern for push/pop is temporarily saving some value. Say, we have something useful in variable
myVar
and for some reason we need to do calculations which will overwrite it (snippet 5.3):6. Getting rid of parameters
Now let's pass parameters using stack (snippet #6):
7. Getting rid of
return
statementsLet's return value in AX register (snippet #7):
8. Stack base pointer (BP) (also known as frame pointer) and stack frame
Lets take more "advanced" function and rewrite it in our asm-like C++ (snippet #8.1):
Now imagine we decided to introduce new local variable to store result there before returning, as we do in
tripple
(snippet #4.1). The body of the function will be (snippet #8.2):You see, we had to update every single reference to function parameters and local variables. To avoid that, we need an anchor index, which doesn't change when the stack grows.
We will create the anchor right upon function entry (before we allocate space for locals) by saving current top (value of SP) into BP register. Snippet #8.3:
The slice of stack, wich belongs to and is in full control of the function is called function's stack frame. E.g.
myAlgo_noLPR_withAnchor
's stack frame isstack[996 .. 994]
(both idexes inclusive).Frame starts at function's BP (after we've updated it inside function) and lasts until the next stack frame. So the parameters on the stack are part of the caller's stack frame (see note 8a).
Notes:
8a. Wikipedia says otherwise about parameters, but here I adhere to Intel software developer's manual, see vol. 1, section 6.2.4.1 Stack-Frame Base Pointer and Figure 6-2 in section 6.3.2 Far CALL and RET Operation. Function's parameters and stack frame are part of function's activation record (see The gen on function perilogues).
8b. positive offsets from BP point to function parameters and negative offsets point to local variables. That's pretty handy for debugging
8c.
stack[BP]
stores the address of the previous stack frame,stack[stack[BP]]
stores pre-previous stack frame and so on. Following this chain, you can discover frames of all the functions in the programm, which didn't return yet. This is how debuggers show you call stack8d. the first 3 instructions of
myAlgo_noLPR_withAnchor
, where we setup the frame (save old BP, update BP, reserve space for locals) are called function prologue9. Calling conventions
In snippet 8.1 we've pushed parameters for
myAlgo
from right to left and returned result inAX
. We could as well pass params left to right and return inBX
. Or pass params in BX and CX and return in AX. Obviously, caller (main()
) and called function must agree where and in which order all this stuff is stored.Calling convention is a set of rules on how parameters are passed and result is returned.
In the code above we've used cdecl calling convention:
myAlgo_noLPR_withAnchor
function in our case), such that the caller (main
function) can rely on those registers not having been changed by a call.(Source: example "32-bit cdecl" from Stack Overflow Documentation; copyright 2016 by icktoofay and Peter Cordes ; licensed under CC BY-SA 3.0. An archive of the full Stack Overflow Documentation content can be found at archive.org, in which this example is indexed by topic ID 3261 and example ID 11196.)
10. Getting rid of function calls
Now the most interesting part. Just like data, executable code is also stored in memory (completely unrelated to memory for stack) and every instruction has an address.
When not commanded otherwise, CPU executes instructions one after another, in the order they are stored in memory. But we can command CPU to "jump" to another location in memory and execute instructions from there on. In asm it can be any address, and in more high-level languages like C++ you can only jump to addresses marked by labels (there are workarounds but they are not pretty, to say the least).
Let's take this function (snippet #10.1):
And instead of calling
tripple
C++ way, do the following:tripple
's body insidemyAlgo
myAlgo
entry jump overtripple
's code withgoto
tripple
's code, save on the stack address of the code line just aftertripple
call, so we can return here later and continue execution (PUSH_ADDRESS
macro below)tripple
function and execute it to the end (3. and 4. together areCALL
macro)tripple
(after we've cleaned up locals), take return address from the top of the stack and jump there (RET
macro)Because there is no easy way to jump to particular code address in C++, we will use labels to mark places of jumps. I won't go into detail how macros below work, just believe me they do what I say they do (snippet #10.2):
Notes:
10a. because return address is stored on the stack, in principle we can change it. This is how stack smashing attack works
10b. the last 3 instructions at the "end" of
triple_label
(cleanup locals, restore old BP, return) are called function's epilogue11. Assembly
Now let's look at real asm for
myAlgo_withCalls
. To do that in Visual Studio:One difference with our asm-like C++ is that asm's stack operate on bytes instead of ints. So to reserve space for one
int
, SP will be decremented by 4 bytes.Here we go (snippet #11.1, line numbers in comments are from the gist):
And asm for
tripple
(snippet #11.2):Hope, after reading this post, assembly doesn't look as cryptic as before :)
Here are links from the post's body and some further reading: