Is accessing data in the heap faster than from the

2019-01-22 06:44发布

问题:

I know this sounds like a general question and I've seen many similar questions (both here and on the web) but none of them are really like my dilemma.

Say I have this code:

void GetSomeData(char* buffer)
{
    // put some data in buffer
}

int main()
{
     char buffer[1024];
     while(1)
     {
          GetSomeData(buffer);
          // do something with the data
     }
     return 0;
}

Would I gain any performance if I declared buffer[1024] globally?

I ran some tests on unix via the time command and there are virtually no differences between the execution times.

But I'm not really convinced...

In theory should this change make a difference?

回答1:

Is accessing data in the heap faster than from the stack?

Not inherently... on every architecture I've ever worked on, all the process "memory" can be expected to operate at the same set of speeds, based on which level of CPU cache / RAM / swap file is holding the current data, and any hardware-level synchronisation delays that operations on that memory may trigger to make it visible to other processes, incorporate other processes'/CPU (core)'s changes etc..

The OS (which is responsible for page faulting / swapping), and the hardware (CPU) trapping on accesses to swapped-out or not-yet-accessed pages, would not even be tracking which pages are "stack" vs "heap"... a memory page is a memory page. That said, the virtual address of global data may be able to be calculated and hardcoded at compile time, the addresses of stack-based data are typically stack-pointer relative, while memory on the heap must almost always be accessed using pointers, which might be slightly slower on some systems - it depends on the CPU addressing modes and cycles, but it's almost always insignificant - not even worth a look or second thought unless you're writing something where millionths of a second are enormously important.

Anyway, in your example you're contrasting a global variable with a function-local (stack/automatic) variable... there's no heap involved. Heap memory comes from new or malloc/realloc. For heap memory, the performance issue worth noting is that the application itself is keeping track of how much memory is in use at which addresses - the records of all that take some time to update as pointers to memory are handed out by new/malloc/realloc, and some more time to update as the pointers are deleted or freed.

For global variables, the allocation of memory may effectively be done at compile time, while for stack based variables there's normally a stack pointer that's incremented by the compile-time-calculated sum of the sizes of local variables (and some housekeeping data) each time a function is called. So, when main() is called there may be some time to modify the stack pointer, but it's probably just being modified by a different amount rather than not modified if there's no buffer and modified if there is, so there's no difference in runtime performance at all.



回答2:

Quoting from Jeff Hill's answer:

The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or free. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.



回答3:

Your question doesn't really have an answer; it depends on what else you are doing. Generally speaking, most machines use the same "memory" structure over the entire process, so regardless of where (heap, stack or global memory) the variable resides, access time will be identical. On the other hand, most modern machines have a hierarchial memory structure, with a memory pipeline, several levels of cache, main memory, and virtual memory. Depending on what has gone on previously on the processor, the actual access may be to any one of these (regardless of whether it is heap, stack or global), and the access times here vary enormously, from a single clock if the memory is in the right place in the pipeline, to something around 10 milliseconds if the system has to go to virtual memory on disk.

In all cases, the key is locality. If an access is "near" a previous access, you greatly improve the chance of finding it in one of the faster locations: cache, for example. In this regard, putting smaller objects on the stack may be faster, because when you access the arguments of a function, you're access on stack memory (with an Intel 32-bit processor, at least---with better designed processors, arguments are more likely to be in registers). But this will probably not be an issue when an array is involved.



回答4:

when allocating buffers on stack the optimization scope is not the cost of accessing the memory but rather the elimination of often very expensive dynamic memory allocation on the heap (stack buffer allocation can be considered instantaneous as the stack as a whole is allocated at thread startup).



回答5:

Giving that variables and variable arrays that are declared on the heap is slower is just a fact. Think about it this way;

Globally created variables are allocated once and deallocated once the program is closing. For a heap object your variable has to be allocated on the spot each time the function is ran, and deallocated in the end of the function..

Ever tried allocating an object pointer within a function? Well better free / delete it before the function exits, orelse you will have yourself a memoryleak giving that you are not doing this in a class object where it is free'd/deleted inside the deconstructor.

When it comes to accessing of an array they all work the same, a memory block is first allocated by sizeof(DataType) * elements. Later can be accessed by ->

1 2 3 4 5 6 
^ entry point [0]
      ^ entry point [0]+3