I am currently reading about exploiting memory vulnerabilities under Linux and I found it hard to find any information on when the layout of stack frames is decided. In other words, is it something determined at the compile time, before the program's execution or are those built when a function is being called? Does the layout differ between operating systems?
问题:
回答1:
There are several factors. On x86, there's a calling convention that defines how to call a function. I assume other architectures have similar things. The system library (e.g. glibc) can define additional conventions. But ultimately the compiler decides how it uses the stack - at least when it does not need to interface external libraries and needs to follow their stack-layout.
回答2:
I doubt you will generally or easily find an documented answer to how stack frames were designed. As others have observed, what gets documented is the result of that design process, often without a lot of associated rationale which I agree would be pretty interesting.
Each design of a stack frame layout comes from presumably people designing a compiler or a set of interoperable compilers for a particular processor architecture and maybe even for the particular OS. This will be influenced by what subroutine needs to access information from callers (arguments? lexical scopes?), what the instruction set does well (lots of registers? easy to push arguments?), strengths or weaknesses of the compilers, etc. Microsoft, as an example, did this design several times over decades, as their compilers and the x86 evolved; their conventions for the x86-32 are really different than they are for x86-64. You can guess at the rationale from the documented result and sometimes there are hints, but not always.
I can give you some ideas, having designed "stack frames" for my company's parallel programming language that runs on an x86.
- Because the language is parallel, stack frames are heap allocated (from an extremely fast, thread-local block allocator), not stack allocated; so "stack frame" isn't quite the right term, we call them "activation records". (I'll continue to call them "stack frames" in this discussion). This scheme supports parallel programming, where one function can fork multiple parallel subcomputations, each of which needs own stack frame; they obviously can't share a single stack. This means each stack frame has to contain an explicit pointer to the previous frame to enable a callee to return. So there is a slot at low offset in the stack frame to hold the callers stack frame pointer. Similarly, there is a slot to hold the caller's stack pointer. These two slots are used instead of the traditional PUSH EBP/ LEA ESP, k[ESP] traditionally used by x86 calling conventions.
- Lexical scoping requires each callee have access to lexical scopes of parents. This is accomplished by setting aside a set of low-offset points in the stack frame to hold a classic "display" (set of pointers to containing scopes), and passing a pointer to the caller's display in ECX to a callee. The callee copies what it needs of the parent's display, perhaps augmenting if the callee is not a leaf procedure.
- The CPU having a limited number of registers meant you couldn't pass all the parameters or even many of them in the registers. We chose to pass one 32 bit argument in EAX, a second in EDX, or a 64 bit argument in EAX/EDX; larger argument lists are passed by pushing the arguments in the stack and simply calling the subroutine. The callee wants access to the arguments; we chose to allocate 2 slots at low offsets in the stack frame to hold EAX/EDX.
- Unlike single-thread code, each PARLANSE stack frame represents a function with a possibly large number of statically-defined parallel computations. The stack frame thus contains a set of "grain" (parallel thread) context blocks with associated fixed sized stacks that each accesses by its ESP register. This scheme allows the compiler to do much of the work of allocating space and setting up parallel grains, minimizing the time to create a "grain", in turn allowing much smaller computations to usefully run in parallel. There's a lot of detail about what goes into each grain control block that isn't worth explaining here; the point is there is a lot of detail that is part of the stack frame design.
My point is the rationale for stack frame design is driven by the machine architecture and the goal of the programming language it is supposed to support. Rationale such as the above doesn't appear in many documents, and yes, that makes it pretty hard to find.
Given a stack frame design, the compiler for a language then allocates space within the frame, for a particular subroutine being compiled.
回答3:
It is a compiler choice at compiler time, if you use the same compiler with the same built choices on different operating systems for the same processor/target you may get the same types of results for stack frames or not.
Stack frames make it easier for the compiler developer to debug the code as well as others to read the code, it is debatable as to whether it costs more or not to use a stack frame. It also might make life easier on the debugger (software) as well, but you would have to be closely in sync with the compiler for that to work.
They are generally not required, cant imagine why a calling convention would ever care, it is simply an implementation thing, do I constantly have to keep track at every point in the function where things are relative to my changing the top of stack, or do I want to pre-compute all the stack I will need for the whole function, and consume it one time, then for the remainder of the function I can hardcode where everything is relative to that, making it easier to read and debug the code sometimes at the cost of another register, sometimes not depending on implementation.
Stack frames are a design choice by the compiler folks and is a compile time thing not a runtime thing. If yo use the same compiler with the same choices you can get the same layout across operating systems, use a different compiler on the same operating system or different operating systems and there are no guarantees the same layout is used or if both even use a stack frame.