How we access stack variables without poping them?

2020-07-22 18:15发布

问题:

As far as i know there are two kind of variables in C, stack variables and heap variables. Stack variables are fast and managed by compiler and CPU automatically. My question about stack variables are these:

  • Are stack variables really stored in a stack FILO data structure?
  • If so, why we can use them without poping them and losing their values?
  • Why stack is used for storing them? What's wrong with a dequeue or list?

回答1:

You, as the programmer, don't worry about pushing or popping variables on the stack; the generated machine code handles all that for you. Every time you call a function, the program pushes items onto the hardware stack. Some of these items are data you pass to the function, but most of it is the current program state (register values, return addresses, etc.), such that the program can continue executing at the right place when the function returns.

An example may help. Take the following simple C program:

#include <stdio.h>

int afunc( int a, int b )
{
  int c = a * b;
  return c;
}

int main( void )
{
  int x;
  int y;
  int z;

  x = 2;
  y = 3;

  z = afunc( x, y );
  printf( "z = %d\n", z );

  return 0;
}

Compiling it with gcc 2.96 on an aging Red Hat box as follows:

gcc -o demo -g -std=c99 -pedantic -Wall -Werror -Wa,-aldh=demo.lst demo.c

gives me the following output listing:

   1                            .file   "demo.c"
   2                            .version        "01.01"
   5                    .text
   6                    .Ltext0:
 165                            .align 4
 169                    .globl afunc
 171                    afunc:
   1:demo.c        **** #include <stdio.h>
   2:demo.c        ****
   3:demo.c        **** int afunc( int a, int b )
   4:demo.c        **** {
 173                    .LM1:
 174                    .LBB2:
 175 0000 55                    pushl   %ebp
 176 0001 89E5                  movl    %esp, %ebp
 177 0003 83EC04                subl    $4, %esp
   5:demo.c        ****   int c = a * b;
 179                    .LM2:
 180 0006 8B4508                movl    8(%ebp), %eax
 181 0009 0FAF450C              imull   12(%ebp), %eax
 182 000d 8945FC                movl    %eax, -4(%ebp)
   6:demo.c        ****   return c;
 184                    .LM3:
 185 0010 8B45FC                movl    -4(%ebp), %eax
 186 0013 89C0                  movl    %eax, %eax
   7:demo.c        **** }
 188                    .LM4:
 189                    .LBE2:
 190 0015 C9                    leave
 191 0016 C3                    ret
 192                    .Lfe1:
 197                    .Lscope0:
 199                                    .section        .rodata
 200                    .LC0:
 201 0000 7A203D20              .string "z = %d\n"
 201      25640A00
 202                    .text
 203 0017 90                    .align 4
 205                    .globl main
 207                    main:
   8:demo.c        ****
   9:demo.c        **** int main( void )
  10:demo.c        **** {
 209                    .LM5:
 210                    .LBB3:
 211 0018 55                    pushl   %ebp
 212 0019 89E5                  movl    %esp, %ebp
 213 001b 83EC18                subl    $24, %esp
  11:demo.c        ****   int x;
  12:demo.c        ****   int y;
  13:demo.c        ****   int z;
  14:demo.c        ****
  15:demo.c        ****   x = 2;
 215                    .LM6:
 216 001e C745FC02              movl    $2, -4(%ebp)
 216      000000
  16:demo.c        ****   y = 3;
 218                    .LM7:
 219 0025 C745F803              movl    $3, -8(%ebp)
 219      000000
  17:demo.c        ****
  18:demo.c        ****   z = afunc( x, y );
 221                    .LM8:
 222 002c 83EC08                subl    $8, %esp
 223 002f FF75F8                pushl   -8(%ebp)
 224 0032 FF75FC                pushl   -4(%ebp)
 225 0035 E8FCFFFF              call    afunc
 225      FF
 226 003a 83C410                addl    $16, %esp
 227 003d 89C0                  movl    %eax, %eax
 228 003f 8945F4                movl    %eax, -12(%ebp)
  19:demo.c        ****   printf( "z = %d\n", z );
 230                    .LM9:
 231 0042 83EC08                subl    $8, %esp
 232 0045 FF75F4                pushl   -12(%ebp)
 233 0048 68000000              pushl   $.LC0
 233      00
 234 004d E8FCFFFF              call    printf
 234      FF
 235 0052 83C410                addl    $16, %esp
  20:demo.c        ****
  21:demo.c        ****   return 0;
 237                    .LM10:
 238 0055 B8000000              movl    $0, %eax
 238      00
  22:demo.c        **** }
 240                    .LM11:
 241                    .LBE3:
 242 005a C9                    leave
 243 005b C3                    ret
 244                    .Lfe2:
 251                    .Lscope1:
 253                            .text
 255                    .Letext:
 256                            .ident  "GCC: (GNU) 2.96 20000731 (Red Hat Linux 7.2 2.96-112.7.2)"

So, starting with main, we have the following lines:

 211 0018 55                    pushl   %ebp
 212 0019 89E5                  movl    %esp, %ebp
 213 001b 83EC18                subl    $24, %esp

%esp points to the top of the stack; %ebp points into the stack frame, between the local variables and function arguments. These lines save the current value of %ebp by pushing it onto the stack, then write the location of the current top of the stack to %ebp, and then advance %esp by 24 bytes (the stack grows "down", or towards decreasing addresses, on x86). Stepping through the execution of this program in a debugger on my system, we see the stack is set up as follows1:

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  <-- %ebp, 0xbfffda18 is the previous value
0xbfffd9d4     0x08  0x04  0x96  0x40  <-- x
0xbfffd9d0     0x08  0x04  0x95  0x40  <-- y
0xbfffd9cc     0x08  0x04  0x84  0x41  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c  <-- %esp

Then we have the lines

216 001e C745FC02              movl    $2, -4(%ebp)

and

219 0025 C745F803              movl    $3, -8(%ebp)

which assign 2 and 3 to x and y, respectively. Note that these locations are referred to as offsets from %ebp. So now our stack looks like this:

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  <-- %ebp
0xbfffd9d4     0x00  0x00  0x00  0x02  <-- x
0xbfffd9d0     0x00  0x00  0x00  0x03  <-- y
0xbfffd9cc     0x08  0x04  0x84  0x41  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c <-- %esp

Now we call afunc. To do that, we first need to push the arguments x and y on the call stack:

 222 002c 83EC08                subl    $8, %esp
 223 002f FF75F8                pushl   -8(%ebp)
 224 0032 FF75FC                pushl   -4(%ebp)

So now our stack looks like

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  <-- %ebp
0xbfffd9d4     0x00  0x00  0x00  0x02  <-- x
0xbfffd9d0     0x00  0x00  0x00  0x03  <-- y
0xbfffd9cc     0x08  0x04  0x84  0x41  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c
0xbfffd9bc     0x40  0x14  0xd7  0xf0
0xbfffd9b8     0x40  0x14  0xe8  0x38
0xbfffd9b4     0x00  0x00  0x00  0x03 <-- b
0xbfffd9b0     0x00  0x00  0x00  0x02 <-- a, %esp

Now we call afunc. First thing we do is save the current value of %ebp, then adjust our registers again:

 175 0000 55                    pushl   %ebp
 176 0001 89E5                  movl    %esp, %ebp
 177 0003 83EC04                subl    $4, %esp

Leaving us with

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  
0xbfffd9d4     0x00  0x00  0x00  0x02  <-- x
0xbfffd9d0     0x00  0x00  0x00  0x03  <-- y
0xbfffd9cc     0x08  0x04  0x84  0x41  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c 
0xbfffd9bc     0x40  0x14  0xd7  0xf0
0xbfffd9b8     0x40  0x14  0xe8  0x38
0xbfffd9b4     0x00  0x00  0x00  0x03 <-- b
0xbfffd9b0     0x00  0x00  0x00  0x02 <-- a
0xbfffd9ac     0x08  0x04  0x84  0x9a 
0xbfffd9a8     0xbf  0xff  0xd9  0xd8 <-- %ebp
0xbfffd9a4     0x40  0x14  0xd7  0xf0 <-- c, %esp

Now we perform our computation in afunc:

 180 0006 8B4508                movl    8(%ebp), %eax
 181 0009 0FAF450C              imull   12(%ebp), %eax
 182 000d 8945FC                movl    %eax, -4(%ebp)

Note the offsets relative to %ebp: this time they're positive (function arguments are stored "below" %ebp, where local variables are stored "above" it). The result is then stored in c:

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  
0xbfffd9d4     0x00  0x00  0x00  0x02  <-- x
0xbfffd9d0     0x00  0x00  0x00  0x03  <-- y
0xbfffd9cc     0x08  0x04  0x84  0x41  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c 
0xbfffd9bc     0x40  0x14  0xd7  0xf0
0xbfffd9b8     0x40  0x14  0xe8  0x38
0xbfffd9b4     0x00  0x00  0x00  0x03 <-- b
0xbfffd9b0     0x00  0x00  0x00  0x02 <-- a
0xbfffd9ac     0x08  0x04  0x84  0x9a 
0xbfffd9a8     0xbf  0xff  0xd9  0xd8 <-- %ebp
0xbfffd9a4     0x00  0x00  0x00  0x06 <-- c, %esp

Function return values are stored in the register %eax. Now we return from the function:

 185 0010 8B45FC                movl    -4(%ebp), %eax
 186 0013 89C0                  movl    %eax, %eax

 190 0015 C9                    leave
 191 0016 C3                    ret

When we return from the function, we pop everything off the stack back to where %esp was pointing before we entered afunc (there's some magic there, but recognize that %ebp was pointing to an address containing the old value of %ebp):

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  <-- %ebp
0xbfffd9d4     0x00  0x00  0x00  0x02  <-- x
0xbfffd9d0     0x00  0x00  0x00  0x03  <-- y
0xbfffd9cc     0x08  0x04  0x84  0x41  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c  <-- %esp

And now we save the result to z:

 228 003f 8945F4                movl    %eax, -12(%ebp)

Leaving us with:

Address        0x00  0x01  0x02  0x03    
-------        ----  ----  ----  ----
0xbfffd9d8     0xbf  0xff  0xda  0x18  <-- %ebp
0xbfffd9d4     0x00  0x00  0x00  0x02  <-- x
0xbfffd9d0     0x00  0x00  0x00  0x03  <-- y
0xbfffd9cc     0x00  0x00  0x00  0x06  <-- z
0xbfffd9c8     0xbf  0xff  0xd9  0xe8
0xbfffd9c4     0xbf  0xff  0xda  0x44
0xbfffd9c0     0x40  0x01  0x5e  0x2c  <-- %esp

Note that this is how things look on a particular hardware/software combination and a particular compiler; the details will vary between compilers (the latest version of gcc uses registers to pass function arguments anywhere it can, rather than pushing them on to the stack), but the general concepts will be the same. Just don't assume this is the way things are done.


1. The values stored between 0xbfffd9c4 and 0xbfffd9c8 are (most likely) not related to our code; they're just the bit patterns that were left after those memory locations were used in another operation. I think the compiler assumes a minimum amount of space for setting up the local frame.



回答2:

When talking about heap and stack, stack is not the usual data structure students reimplement in 'data structures' lectures with pointers and stuff. The stack is usually just a chunk of memory that functions allocate memory for their local variables from.

Stack is pseudo-FILO: its basic element is a ''function frame'' -- a collection of all stack variables a particular function uses. Within a function frame you can access all variables you want without a performance penalty. However, you can't access variables from other functions frames without popping (i.e. returning).



回答3:

The C standard does not talk about the stack of heap just automatic variables(local variables which will usually be allocated on the stack) and dynamic variables(via malloc etc... which will usually be allocated on the heap), there are also static variables too. Where automatic, dynamic and static variables are stored is implementation defined behavior.

On most modern system automatic variables will indeed be stored on a stack but for example gcc on x86 will usually just allocate space on the stack and then use offsets to access the stack as opposed to poping and pushing.



标签: c stack heap