What are the next step to improve malloc() algorit

2020-06-19 07:22发布

I'm writing my own simple malloc() function and I would like to create more faster and efficient variant. I'm writed function that use linear search and allocated sequentially and contiguously in memory.

What is the next step to improve this algorithm? What are the main shortcomings of my current version? I would be very grateful for any feedback and recommendation.

typedef struct heap_block
{  
    struct heap_block* next;
    size_t size;
    bool isfree;
}header;

#define Heap_Capacity 100000
static char heap[Heap_Capacity];
size_t heap_size;

void* malloc(size_t sz) 
{
    if(sz == 0 || sz > Heap_Capacity) { return NULL; }

    header* block = (header*)heap;
    if(heap_size == 0)
    {
        set_data_to_block(block, sz);
        return (void*)(block+1);
    }

    while(block->next != NULL) { block = block->next; }

    block->next = (header*)((char*)to_end_data(block) + 8);
    header* new_block = block->next;
    set_data_to_block(new_block, sz);

    return (void*)(new_block+1);
}

void set_data_to_block(header* block, size_t sz)
{
    block->size = sz;
    block->isfree = false;
    block->next = NULL;
    heap_size += sz;
}

header* to_end_data(header* block)
{
    return (header*)((size_t)(block+1) + block->size);
}

2条回答
在下西门庆
2楼-- · 2020-06-19 07:42

Notice that malloc is often built above lower level memory related syscalls (e.g. mmap(2) on Linux). See this answer which mentions GNU glibc and musl-libc. Look also inside tcmalloc, so study the source code of several free software malloc implementations.

Some general ideas for your malloc:

  • retrieve memory from the OS using mmap (and release it back to the OS kernel eventually with munmap). You certainly should not allocate a fixed size heap (since on a 64 bits computer with 128Gbytes of RAM you would want to succeed in malloc-ing a 10 billion bytes zone).
  • segregate small allocations from big ones, so handle differently malloc of 16 bytes from malloc of a megabyte. The typical threshold between small and large allocation is generally a small multiple of the page size (which is often 4Kbytes). Small allocations happen inside pages. Large allocations are rounded to pages. You might even handle very specially malloc of two words (like in many linked lists).
  • round up the requested size to some fancy number (e.g. powers of 2, or 3 times a power of 2).
  • manage together memory zones of similar sizes, that is of same "fancy" size.
  • for small memory zones, avoid reclaiming too early the memory zone, so keep previously free-d zones of same (small) size to reuse them in future calls to malloc.
  • you might use some tricks on the address (but your system might have ASLR), or keep near each memory zone a word of meta-data describing the chunk of which it is a member.
  • a significant issue is, given some address previously returned by malloc and argument of free, to find out the allocated size of that memory zone. You could manipulate the address bits, you could store that size in the word before, you could use some hash table, etc. Details are tricky.

Notice that details are tricky, and it might be quite hard to write a malloc implementation better than your system one. In practice, writing a good malloc is not a simple task. You should find many academic papers on the subject.

Look also into garbage collection techniques. Consider perhaps Boehm's conservative GC: you will replace malloc by GC_MALLOC and you won't bother about free... Learn about memory pools.

查看更多
够拽才男人
3楼-- · 2020-06-19 08:00

There are 3 ways to improve:

  • make it more robust
  • optimise the way memory is allocated
  • optimise the code that allocates the memory

Make It More Robust

There are many common programmer mistakes that could be easily detected (e.g. modifying data beyond the end of the allocated block). For a very simple (and relatively fast) example, it's possible to insert "canaries" before and after the allocated block, and detect programmer during free and realloc by checking if the canaries are correct (if they aren't then the programmer has trashed something by mistake). This only works "sometimes maybe". The problem is that (for a simple implementation of malloc) the meta-data is mixed in with the allocated blocks; so there's a chance that the meta-data has been corrupted even if the canaries haven't. To fix that it'd be a good idea to separate the meta-data from the allocated blocks. Also, merely reporting "something was corrupted" doesn't help as much as you'd hope. Ideally you'd want to have some sort of identifier for each block (e.g. the name of the function that allocated it) so that if/when problems occur you can report what was corrupted. Of course this could/should maybe be done via. a macro, so that those identifiers can be omitted when not debugging.

The main problem here is that the interface provided by malloc is lame and broken - there's simply no way to return acceptable error conditions ("failed to allocate" is the only error it can return) and no way to pass additional information. You'd want something more like int malloc(void **outPointer, size_t size, char *identifier) (with similar alterations to free and realloc to enable them to return a status code and identifier).

Optimise the Way Memory is Allocated

It's naive to assume that all memory is the same. It's not. Cache locality (including TLB locality) and other cache effects, and things like NUMA optimisation, all matter. For a simple example, imagine you're writing an application that has a structure describing a person (including a hash of their name) and a pointer to a person's name string; and both the structure and the name string are allocated via. malloc. The normal end result is that those structures and strings end up mixed together in the heap; so when you're searching through these structures (e.g. trying to find the structure that contains the correct hash) you end up pounding caches and TLBs. To optimise this properly you'd want to ensure that all the structures are close together in the heap. For that to work malloc needs to the difference between allocating 32 bytes for the structure and allocating 32 bytes for the name string. You need to introduce the concept of "memory pools" (e.g. where everything in "memory pool number 1" is kept close in the heap).

Another important optimisations include "cache colouring" (see http://en.wikipedia.org/wiki/Cache_coloring ). For NUMA systems, it can be important to know the difference between something where max. bandwidth is needed (where using memory from multiple NUMA domains increases bandwidth).

Finally, it'd be nice (to manage heap fragmentation, etc) to use different strategies for "temporary, likely to be freed soon" allocations and longer term allocations (e.g. where it's worth doing a extra to minimise fragmentation and wasted space/RAM).

Note: I'd estimate that getting all of this right can mean software running up to 20% faster in specific cases, due to far less cache misses, more bandwidth where it's needed, etc.

The main problem here is that the interface provided by malloc is lame and broken - there's simply no way to pass the additional information to malloc in the first place. You'd want something more like int malloc(void **outPointer, size_t size, char *identifier, int pool, int optimisationFlags) (with similar alterations to realloc).

Optimise the Code That Allocates the Memory

Given that you can assume memory is used more frequently than its allocated; this is the least important (e.g. less important than getting things like cache locality for the allocated blocks right).

Quite frankly, anyone that actually wants decent performance or decent debugging shouldn't be using malloc to begin with - generic solutions to specific problems are never ideal. With this in mind (and not forgetting that the interface to malloc is lame and broken and prevents everything that's important anyway) I'd recommend simply not bothering with malloc and creating something that's actually good (but non-standard) instead. For this you can adapt the algorithms used by existing implementations of malloc.

查看更多
登录 后发表回答