I'm writing my own simple malloc()
function and I would like to create more faster and efficient variant. I'm writed function that use linear search and allocated sequentially and contiguously in memory.
What is the next step to improve this algorithm? What are the main shortcomings of my current version? I would be very grateful for any feedback and recommendation.
typedef struct heap_block
{
struct heap_block* next;
size_t size;
bool isfree;
}header;
#define Heap_Capacity 100000
static char heap[Heap_Capacity];
size_t heap_size;
void* malloc(size_t sz)
{
if(sz == 0 || sz > Heap_Capacity) { return NULL; }
header* block = (header*)heap;
if(heap_size == 0)
{
set_data_to_block(block, sz);
return (void*)(block+1);
}
while(block->next != NULL) { block = block->next; }
block->next = (header*)((char*)to_end_data(block) + 8);
header* new_block = block->next;
set_data_to_block(new_block, sz);
return (void*)(new_block+1);
}
void set_data_to_block(header* block, size_t sz)
{
block->size = sz;
block->isfree = false;
block->next = NULL;
heap_size += sz;
}
header* to_end_data(header* block)
{
return (header*)((size_t)(block+1) + block->size);
}
Notice that
malloc
is often built above lower level memory related syscalls (e.g. mmap(2) on Linux). See this answer which mentions GNUglibc
andmusl-libc
. Look also inside tcmalloc, so study the source code of several free software malloc implementations.Some general ideas for your
malloc
:mmap
(and release it back to the OS kernel eventually withmunmap
). You certainly should not allocate a fixed size heap (since on a 64 bits computer with 128Gbytes of RAM you would want to succeed inmalloc
-ing a 10 billion bytes zone).malloc
of 16 bytes frommalloc
of a megabyte. The typical threshold between small and large allocation is generally a small multiple of the page size (which is often 4Kbytes). Small allocations happen inside pages. Large allocations are rounded to pages. You might even handle very speciallymalloc
of two words (like in many linked lists).free
-d zones of same (small) size to reuse them in future calls tomalloc
.malloc
and argument offree
, to find out the allocated size of that memory zone. You could manipulate the address bits, you could store that size in the word before, you could use some hash table, etc. Details are tricky.Notice that details are tricky, and it might be quite hard to write a
malloc
implementation better than your system one. In practice, writing a goodmalloc
is not a simple task. You should find many academic papers on the subject.Look also into garbage collection techniques. Consider perhaps Boehm's conservative GC: you will replace
malloc
byGC_MALLOC
and you won't bother aboutfree
... Learn about memory pools.There are 3 ways to improve:
Make It More Robust
There are many common programmer mistakes that could be easily detected (e.g. modifying data beyond the end of the allocated block). For a very simple (and relatively fast) example, it's possible to insert "canaries" before and after the allocated block, and detect programmer during
free
andrealloc
by checking if the canaries are correct (if they aren't then the programmer has trashed something by mistake). This only works "sometimes maybe". The problem is that (for a simple implementation ofmalloc
) the meta-data is mixed in with the allocated blocks; so there's a chance that the meta-data has been corrupted even if the canaries haven't. To fix that it'd be a good idea to separate the meta-data from the allocated blocks. Also, merely reporting "something was corrupted" doesn't help as much as you'd hope. Ideally you'd want to have some sort of identifier for each block (e.g. the name of the function that allocated it) so that if/when problems occur you can report what was corrupted. Of course this could/should maybe be done via. a macro, so that those identifiers can be omitted when not debugging.The main problem here is that the interface provided by
malloc
is lame and broken - there's simply no way to return acceptable error conditions ("failed to allocate" is the only error it can return) and no way to pass additional information. You'd want something more likeint malloc(void **outPointer, size_t size, char *identifier)
(with similar alterations tofree
andrealloc
to enable them to return a status code and identifier).Optimise the Way Memory is Allocated
It's naive to assume that all memory is the same. It's not. Cache locality (including TLB locality) and other cache effects, and things like NUMA optimisation, all matter. For a simple example, imagine you're writing an application that has a structure describing a person (including a hash of their name) and a pointer to a person's name string; and both the structure and the name string are allocated via. malloc. The normal end result is that those structures and strings end up mixed together in the heap; so when you're searching through these structures (e.g. trying to find the structure that contains the correct hash) you end up pounding caches and TLBs. To optimise this properly you'd want to ensure that all the structures are close together in the heap. For that to work
malloc
needs to the difference between allocating 32 bytes for the structure and allocating 32 bytes for the name string. You need to introduce the concept of "memory pools" (e.g. where everything in "memory pool number 1" is kept close in the heap).Another important optimisations include "cache colouring" (see http://en.wikipedia.org/wiki/Cache_coloring ). For NUMA systems, it can be important to know the difference between something where max. bandwidth is needed (where using memory from multiple NUMA domains increases bandwidth).
Finally, it'd be nice (to manage heap fragmentation, etc) to use different strategies for "temporary, likely to be freed soon" allocations and longer term allocations (e.g. where it's worth doing a extra to minimise fragmentation and wasted space/RAM).
Note: I'd estimate that getting all of this right can mean software running up to 20% faster in specific cases, due to far less cache misses, more bandwidth where it's needed, etc.
The main problem here is that the interface provided by
malloc
is lame and broken - there's simply no way to pass the additional information tomalloc
in the first place. You'd want something more likeint malloc(void **outPointer, size_t size, char *identifier, int pool, int optimisationFlags)
(with similar alterations torealloc
).Optimise the Code That Allocates the Memory
Given that you can assume memory is used more frequently than its allocated; this is the least important (e.g. less important than getting things like cache locality for the allocated blocks right).
Quite frankly, anyone that actually wants decent performance or decent debugging shouldn't be using
malloc
to begin with - generic solutions to specific problems are never ideal. With this in mind (and not forgetting that the interface tomalloc
is lame and broken and prevents everything that's important anyway) I'd recommend simply not bothering withmalloc
and creating something that's actually good (but non-standard) instead. For this you can adapt the algorithms used by existing implementations ofmalloc
.