Cost of static memory allocation vs dynamic memory

2020-02-08 07:31发布

I am very interested to know what is the preferred method of memory allocation static vs dynamic is good for performance (e.g., running time) when you know the exact number of objects/items in C on Linux. Cost for a small number of objects (small amount of memory) and as well as for a large number of objects (huge amount of memory).

e.g., type A[N] vs type *A = malloc(sizeof(type) * N)

Please let me know. Thank you.

Note: We can benchmark this and probably know the answer. But I would like to know the concepts that explain the performance differences between these two allocation methods.

4条回答
地球回转人心会变
2楼-- · 2020-02-08 08:09

True static allocations (globals, and local variables marked static) are glued into sections and loaded along with the rest of the section at load time (execve) using mmap. They're essentially free, already covered by the cost of loading the program.

Automatic arrays with statically known sizes can be glued together with other local variables and allocated by adjusting the stack pointer. That's essentially one integer subtraction (the stack grows downwards) or something close to 1 ns on modern processors.

Variable length arrays can't be glued to other variables so they should cost about 1 ns each.

I tried measuring mallocs with different sizes in a single threaded process and I got this, which would imply that malloc+free pairs cost about 50-100 times as much as stack variables for allocations <16MiB. After that, it spikes upwards (32MiB and 64MiB both cost about a hundred times as much as allocations <=16MiB do):

Malloc times

These are just the costs of obtaining the pointer though. Actual access may cause page faults and thereby further increase the cost of the memory block. The page faults should be extremely rare with stack allocations, but likely with first-time accesses to dynamically allocated memory.

Additionally, many small dynamic allocations will put quite a strain on caches as your memory will be fragmented. (You can reduce this fragmentation by using memory pools, either your own or the obstack functionality available as part of glibc.)

查看更多
欢心
3楼-- · 2020-02-08 08:18

If you know exactly how much space you need to set aside and your primary concern is performance with respect to memory management and your code doesn't need to be re-entrant, then you will want to allocate objects with static storage duration, either by declaring them at file scope or by using the static storage class specifier.

int data[SOME_NUMBER];

void foo( /* some list of parameters here */ )
{
  static int some_more_data[SOME_OTHER_NUMBER];
  ...
}

Both data and some_more_data exist over the lifetime of the program, but some_more_data is only visible within the foo function1.

In practice, items with static storage duration have space set aside for them in the binary image itself, so that memory is available as soon as the program is loaded. This may have an advantage as far as locality is concerned; if the data and the code are in the same page, there's no need to swap. Of course, it depends on the code.

The obvious drawback is that your executable file will have a larger footprint. Another drawback is that your code isn't re-entrant; every invocation of foo is working on the same block of memory when it reads or writes some_more_data. Depending on what your code does, that may or may not be a big deal.

If your code needs to be re-entrant, then you can't use objects with static storage duration. If the block of data is relatively small, use objects with automatic storage duration (i.e., regular local variables).

void foo( /* some list of parameters here */ )
{
  int some_more_data[SOME_SMALLISH_NUMBER];
  ...
}

In this case, some_more_data only exists for the lifetime of the foo function; each time you call foo, it allocates another some_more_data object automatically. Whatever overhead there is in setting aside the memory is part of the overhead of calling the function in the first place (at least in my experience). The stack pointer still gets adjusted whether you're using local variables or not, so using local variables won't make it any slower. The main issue is that the available memory for automatic objects is relatively small; for objects above a certain size, this approach simply won't work.

If your code needs to be re-entrant and you need to allocate large blocks of memory, you're pretty much stuck with dynamic memory management (malloc/calloc/realloc and free). Depending on how you design your code, you can minimize some of the performance issues.


1. Visibility rules are enforced during translation from source to machine code; they don't really apply at run time.

查看更多
▲ chillily
4楼-- · 2020-02-08 08:19

Static allocation will be much faster. Static allocation can happen at global scope, and on the stack.

In global scope statically allocated memory is built into the binary image. That is the total size of the memory required, and where it is to be located in the running binary is computed at compile time. Then when the program loads the operating system loader will allocate enough memory for all the global static arrays. I'm pretty sure it happens in constant time for all the allocations. ( e.g. more allocations don't cost more time )

In local scope, static allocations are allocated on the stack. This involves simply reserving a fixed number of bytes on the stack, and happens in constant time per allocation. Stack space is very limited.

Dynamic memory must be allocated from a heap, and even in the best case most allocations will take time that scales more than linear with each allocation, like n log n time or something.

Also practically speaking the dynamic allocation will be many times slower than static allocation.

@update: As has been pointed out in comments below: stack allocations are not technically static allocations ( but they are allocations with the syntactic form used in OP's question ).

Also when speaking about "allocation time", I'm considering total time to manage the memory ( alloc and free ).

In some dynamic allocators allocation time is faster than freeing time.

It is also true that some fast allocators trade memory efficiency for allocation speed. In these cases static is still "better" in that static and stack allocs are no time and constant time respectively while allocating exact sized blocks.

Dynamic allocators to be fast trade off significant memory efficiency ( e.g. buddy allocators round up to the next power of two sized block, like 33k alloc will use 64k )

查看更多
Explosion°爆炸
5楼-- · 2020-02-08 08:32

Global memory is normally allocated when your program (or shared module/dll) loads and pre-populated with its initial values. This typically has its own memory section.

The stack is a block (series of memory pages) of memory allocated by the kernel when you create a new thread. Allocating stack memory on the stack typically is done by decremeting the stack pointer (one machine instruction - (stack a typically full descening - newer allocations have lower addresses). When a stack object is deleted, the stack pointer is incremented). Stacks therefore have strict first in last out semantic. Stack are also fixed size so when you run out you get a stack overflow.

When one uses malloc (in C) or new (C++), memory is allocated on the heap/free-store. This is any number memory block. When more memory is needed than has been already allocated, malloc goes to the kernel to request more memory from the system. Typically malloc will use freed memory already obtained from the system.

Calls to malloc must be assumed to be slow and should be avoided for any performance critical or real-time routines. This is for 2 reasons.

  1. The heap needs to allocate and free any size of memory in any order. Older algorithms used to walk a list of freed blocks until one of suitable size was found. Since memory could be highly fragmented, this could potentially be slow. If the heap is used on multiple threads some sort of locking needs to be provided. Much research has been done to improve this situation and modern heaps the jemalloc and tcmalloc do improve things. However, don't count on this.
  2. If required memory cannot be allocated from what is already managed by the heap allocator, the heap allocator will need to ask the kernel for more memory. (On unix this is done with the mmap or sbrk system calls). The kernel needs to find some unallocated memory pages and the has to map these pages into your processes memory space). Any form of caching goes out the window). So expect this to be super slow (for a computer).

So if you need do any performance critical processing, allocate all the memory up front and then reuse the already allocated memory. Always assume calls to malloc and free will be slow.

查看更多
登录 后发表回答