Stack allocation inside a loop

2019-08-07 05:03发布

问题:

In C, when you write code like this:

void some_function(void) {
    while (something) {
        char buf[4096];
        ...
    }
}

Does the allocation of buf happen when the function is called? Or does a separate allocation happen for every iteration of the loop?

Would there be any performance gain if I put the declaration of buf outside of the loop (i.e. at the beginning of the function)?

回答1:

The buf is allocated in the frame of some_function on the stack. It only gets allocated once when some_function is called. So even if you put the declaration of buf outside, you won't get any performance gain.

But it's different if you write something like

while (...) {
  int a = 5;
}

in the loop. The assignment happens each iteration.



回答2:

In effect, the buffer is allocated on each iteration of the loop. Compilers optimize, though, and where there is no initializer, they will actually not produce code that does anything to allocate the space on each iteration; it will be as if the variable is declared outside the loop. Add initialization to the definition and you will see an effect on the performance as the initialization will be done on each iteration of the loop.



回答3:

Allocating on the stack on a general purpose processor (x86, PowerPC, ARM) is at most a single instruction that changes the stack pointer register. This won't affect performance much at all (see note below). In addition, the compiler can hoist the stack allocation outside of the loop for you as well. Bottom line is that the gain would be small to none.

Note: Changing the stack pointer register can introduce instruction dependencies in out-of-order processors.



回答4:

The C standard allows the compiler to allocate and deallocate in each iteration or once for the function. In practice, every compiler I've seen allocates for the function, and that's quite a few. Even if allocation were once per iteration, however, the difference would be ~2 instructions to bump the stack pointer down and up (or up and down for upward-growing stacks). Seeing a significant performance difference would be rare.



回答5:

In your specific case, there is probably no performance hit. In the WORST case (no optimization whatsoever), allocating the but is something like:

 sub sp, #4096

and deallocating is something like

 add sp, #4096

Keep in mind that even without optimization, this is likely to occur only one for all local variables defined in the loop. If you have something like this:

It would likely be translated into something like

 sub sp, #4100
 . . . . 
 add sp, #4100

So doing

void some_function(void) {
    char buf[4096];
    while (something) {
      int x;
    ...
    }
}

would have no change whatsoever for performance.

Adding initializations:

void some_function(void) {
    while (something) {
      char buf[4096] = "Something" ;
      int x;
    ...
    }
}

will increase the performance hit. In most cases the overhead will be small.

However, putting an object in a loop that opens an internet connection will slow things down greatly.

It's a matter of balance. For most applications,

      char buf[4096] = "Something" ;

is not noticeable. In a loop handing real time interrupts, it could be critical.

Code for clarity. Having the variable scope as limited as possible improves clarity. Performance comes form design; not coding. If you find through actual measurement that some particular coding construct is causing things to run slow, then you can change the code.