This is an excerpt from the book on C by Kernighan and Ritchie. It shows how to implement a version of malloc
. Although well commented, I am having great difficulty in understanding it. Can somebody please explain it?
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
static Header base; /* empty list to get started */
static Header *freep = NULL; /* start of free list */
/* malloc: general-purpose storage allocator */
void *malloc(unsigned nbytes)
{
Header *p, *prevp;
Header *morecore(unsigned);
unsigned nunits;
nunits = (nbytes+sizeof(Header)-1)/sizeof(header) + 1;
if ((prevp = freep) == NULL) { /* no free list yet */
base.s.ptr = freeptr = prevptr = &base;
base.s.size = 0;
}
for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr) {
if (p->s.size >= nunits) { /* big enough */
if (p->s.size == nunits) /* exactly */
prevp->s.ptr = p->s.ptr;
else { /* allocate tail end */
p->s.size -= nunits;
p += p->s.size;
p->s.size = nunits
}
freep = prevp;
return (void *)(p+1);
}
if (p == freep) /* wrapped around free list */
if ((p = morecore(nunits)) == NULL)
return NULL; /* none left */
}
}
#define NALLOC 1024 /* minimum #units to request */
/* morecore: ask system for more memory */
static Header *morecore(unsigned nu)
{
char *cp, *sbrk(int);
Header *up;
if (nu < NALLOC)
nu = NALLOC;
cp = sbrk(nu * sizeof(Header));
if (cp == (char *) -1) /* no space at all */
return NULL;
up = (Header *) cp;
up->s.size = nu;
free((void *)(up+1));
return freep;
}
/* free: put block ap in free list */
void free(void *ap) {
Header *bp, *p;
bp = (Header *)ap - 1; /* point to block header */
for (p = freep; !(bp > p && bp < p->s.ptr); p = p->s.ptr)
if (p >= p->s.ptr && (bp > p || bp < p->s.ptr))
break; /* freed block at start or end of arena */
if (bp + bp->size == p->s.ptr) {
bp->s.size += p->s.ptr->s.size;
bp->s.ptr = p->s.ptr->s.ptr;
} else
bp->s.ptr = p->s.ptr;
if (p + p->size == bp) {
p->s.size += bp->s.size;
p->s.ptr = bp->s.ptr;
} else
p->s.ptr = bp;
freep = p;
}
The basic of malloc()
In Linux, there are two typical ways to request memory: sbrk and mmap. These system calls have severe limitations on frequent small allocations. malloc() is a library function to address this issue. It requests large chunks of memory with sbrk/mmap and returns small memory blocks inside large chunks. This is much more efficient and flexible than directly calling sbrk/mmap.
K&R malloc()
In the K&R implementation, a core (more commonly called arena) is a large chunk of memory.
morecore()
requests a core from system viasbrk()
. When you call malloc()/free() multiple times, some blocks in the cores are used/allocated while others are free. K&R malloc stores the addresses of free blocks in a circular single linked list. In this list, each node is a block of free memory. The firstsizeof(Header)
bytes keep the size of the block and the pointer to the next free block. The rest of bytes in the free block are uninitialized. Different from typical lists in textbooks, nodes in the free list are just pointers to some unused areas in cores; you don't actually allocate each node except for cores. This list is the key to the understanding of the algorithm.The following diagram shows an example memory layout with two cores/arenas. In the diagram, each character takes
sizeof(Header)
bytes.@
is aHeader
,+
marks allocated memory and-
marks free memory inside cores. In the example, there are three allocated blocks and three free blocks. The three free blocks are stored in the circular list. For the three allocated blocks, only their sizes are stored inHeader
.In your code,
freep
is an entry point to the free list. If you repeatedly followfreep->ptr
, you will come back tofreep
– it is circular. Once you understand the circular single-linked list, the rest is relatively easy.malloc()
finds a free block and possibly splits it.free()
adds a free block back to the list and may merge it to adjacent free blocks. They both try to maintain the structure of the list.Other comments on the implementation
The code comments mentioned "wrapped around" in
malloc()
. That line happens when you have traversed the entire free list but can't find a free block larger than the requested length. In this case, you have to add a new core withmorecore()
.base
is a zero-sized block that is always included in the free list. It is a trick to avoid special casing. It is not strictly necessary.free()
may look a little complex because it has to consider four different cases to merge a newly freed block to other free blocks in the list. This detail is not that important unless you want to reimplement by yourself.This blog post explains K&R malloc in more details.
PS: K&R malloc is one of the most elegant pieces of code in my view. It was really eye opening when I first understood the code. It makes me sad that some modern programmers, not even understanding the basic of this implementation, are calling the masterpiece crap solely based on its coding style.
Ok, what we have here is a chunk of really poorly written code. What I will do in this post could best be described as software archaeology.
Step 1: fix the formatting.
The indention and compact format doesn't do anyone any good. Various spaces and empty rows need to be inserted. The comments could be written in more readable ways. I'll start by fixing that.
At the same time I'm changing the brace style from K&R style - please note that the K&R brace style is acceptable, this is merely a personal preference of mine. Another personal preference is to write the * for pointers next to the type pointed at. I'll not argue about (subjective) style matters here.
Also, the type definition of
Header
is completely unreadable, it needs a drastic fix.And I spotted something completely obscure: they seem to have declared a function prototype inside the function.
Header* morecore(unsigned);
. This is very old and very poor style, and I'm not sure if C even allows it any longer. Lets just remove that line, whatever that function does, it will have to be defined elsewhere.Ok now we might actually be able to read the code.
Step 2: weed out widely-recognized bad practice.
This code is filled with things that are nowadays regarded as bad practice. They need to be removed, since they jeopardize the safety, readability and maintenance of the code. If you want a reference to an authority preaching the same practices as me, check out the widely-recognized coding standard MISRA-C.
I have spotted and removed the following bad practices:
1) Just typing
unsigned
in the code could lead to be confusion: was this a typo by the programmer or was the intention to writeunsigned int
? We should replace allunsigned
withunsigned int
. But as we do that, we find that it is used in this context to give the size of various binary data. The correct type to use for such matters is the C standard typesize_t
. This is essentially just an unsigned int as well, but it is guaranteed to be "large enough" for the particular platform. Thesizeof
operator returns a result of typesize_t
and if we look at the C standard's definition of the real malloc, it isvoid *malloc(size_t size);
. Sosize_t
is the most correct type to use.2) It is a bad idea to use the same name for our own malloc function as the one residing in stdlib.h. Should we need to include stdlib.h, things will get messy. As a rule of thumb, never use identifier names of C standard library functions in your own code. I'll change the name to kr_malloc.
3) The code is abusing the fact that all static variables are guaranteed to be initialized to zero. This is well-defined by the C standard, but a rather subtle rule. Lets initialize all statics explicitly, to show that we haven't forgotten to init them by accident.
4) Assignment inside conditions is dangerous and hard to read. This should be avoided if possible, since it can also lead to bugs, such as the classic = vs == bug.
5) Multiple assignments on the same row is hard to read, and also possibly dangerous, because of the order of evaluation.
6) Multiple declarations on the same row is hard to read, and dangerous, since it could lead to bugs when mixing data and pointer declarations. Always declare each variable on a row of its own.
7) Always uses braces after every statement. Not doing so will lead to bugs bugs bugs.
8) Never type cast from a specific pointer type to void*. It is unnecessary in C, and could hide away bugs that the compiler would otherwise have detected.
9) Avoid using multiple return statements inside a function. Sometimes they lead to clearer code, but in most cases they lead to spaghetti. As the code stands, we can't change that without rewriting the loop though, so I will fix this later.
10) Keep for loops simple. They should contain one init statement, one loop condition and one iteration, nothing else. This for loop, with the comma operator and everything, is very obscure. Again, we spot a need to rewrite this loop into something sane. I'll do this next, but for now we have:
Step 3: rewrite the obscure loop.
For the reasons mentioned earlier. We can see that this loop goes on forever, it terminates by returning from the function, either when the allocation is done, or when there is no memory left. So lets create that as a loop condition, and lift out the return to the end of the function where it should be. And lets get rid of that ugly comma operator.
I'll introduce two new variables: one result variable to hold the resulting pointer, and another to keep track of whether the loop should continue or not. I'll blow K&R's minds by using the
bool
type, which is part of the C language since 1999.(I hope I haven't altered the algorithm with this change, I believe I haven't)
Step 4: make this crap compile.
Since this is from K&R, it is filled with typos.
sizeof(header)
should besizeof(Header)
. There are missing semi-colons. They use different names freep, prevp versus freeptr, prevptr, but clearly mean the same variable. I believe the latter were actually better names, so lets use those.And now we have somewhat readable, maintainable code, without numerous dangerous practices, that will even compile! So now we could actually start to ponder about what the code is actually doing.
The struct "Header" is, as you might have guessed, the declaration of a node in a linked list. Each such node contains a pointer to the next one. I don't quite understand the morecore function, nor the "wrap-around", I have never used this function, nor
sbrk
. But I assume that it allocates a header as specified in this struct, and also some chunk of raw data following that header. If so, that explains why there is no actual data pointer: the data is assumed to follow the header, adjacently in memory. So for each node, we get the header, and we get a chunk of raw data following the header.The iteration itself is pretty straight-forward, they are going through a single-linked list, one node at a time.
At the end of the loop, they set the pointer to point one past the end of the "chunk", then store that in a static variable, so that the program will remember where it previously allocated memory, next time the function is called.
They are using a trick to make their header end up on an aligned memory address: they store all the overhead info in a union together with a variable large enough to correspond to the platform's alignment requirement. So if the size of "ptr" plus the size of "size" are too small to give the exact alignment, the union guarantees that at least sizeof(Align) bytes are allocated. I believe that this whole trick is obsolete today, since the C standard mandates automatic struct/union padding.
I'm studying K&R as I'd imagine OP was when he asked this question, and I came here because I also found these implementations to be confusing. While the accepted answer is very detailed and helpful, I tried to take a different tack which was to understand the code as it was originally written - I've gone through the code and added comments to the sections of the code that were difficult to me. This includes code for the other routines in the section (which are the functions
free
andmemcore
- I've renamed themkandr_malloc
andkandr_free
to avoid conflicts with the stdlib). I thought I would leave this here as a supplement to the accepted answer, for other students who may find it helpful.I acknowledge that the comments in this code are excessive. Please know that I am only doing this as a learning exercise and I am not proposing that this is a good way to actually write code.
I took the liberty of changing some variable names to ones that seemed more intuitive to me; other than that the code is essentially left intact. It seems to compile and run fine for the test programs that I used, although valgrind had complaints for some applications.
Also: some of the text in the comments is lifted directly from K&R or the man pages - I do not intend to take any credit for these sections.
I also found this exercise great and interesting.
In my opinion visualizing the structure may help a lot with understanding the logic - or at least this worked for me. Below is my code, which prints as much as possible about the flow of the K&R malloc.
The most significant change I made in the K&R malloc is the change of 'free' to make sure some old pointer will not be used again. Other than that I added comments and fixed some small typos.
Experimenting with NALLOC, MAXMEM and the test variables in 'main' could be also of help.
On my computer (Ubuntu 16.04.3) this compiled without errors with:
krmalloc.c :