I have a question on problem 13.9 in the book, "cracking the coding interview".
The question is to write an aligned alloc and free function that supports allocating memory, and in the answer the code is given below:
void *aligned_malloc(size_t required_bytes, size_t alignment) {
void *p1;
void **p2;
int offset=alignment-1+sizeof(void*);
if((p1=(void*)malloc(required_bytes+offset))==NULL)
return NULL;
p2=(void**)(((size_t)(p1)+offset)&~(alignment-1)); //line 5
p2[-1]=p1; //line 6
return p2;
}
I am so confused with the line 5 and line 6. Why do you have to do an "and" since you have already add offset to p1? and what does [-1] mean? Thanks for the help in advance.
Your sample code is not complete. It allocates nothing. It is pretty obvious you are missing a malloc statement, which sets the p1 pointer. I don't have the book, but I think the complete code should goes along these lines:
void *aligned_malloc(size_t required_bytes, size_t alignment) {
void *p1;
void **p2;
int offset=alignment-1+sizeof(void*);
p1 = malloc(required_bytes + offset); // the line you are missing
p2=(void**)(((size_t)(p1)+offset)&~(alignment-1)); //line 5
p2[-1]=p1; //line 6
return p2;
}
So ... what does the code do?
- The strategy is to malloc more space than what we need (into p1), and return a p2 pointer somewhere after the beginning of the buffer.
- Since alignment is a power of two, in binary it has to form of 1 followed by zeros. e.g. if alignment is 32, it will be 00100000 in binary
- (alignment-1) in binary format will turn the 1 into 0, and all the 0's after the 1 into 1. For example: (32-1) is 00011111
- the ~ will reverse all the bits. That is: 11100000
- now, p1 is a pointer to the buffer (remember, the buffer is larger by offset than what we need). we add offset to p1: p1+offset.
- So now, (p1+offset) is greater that what we want to return. We'll nil all the insignificant bits by doing a bitwise and: (p1+offset) & ~(offset-1)
- This is p2, the pointer that we want to return. Note that because its last 5 digits are zero it is 32 aligned, as requested.
- p2 is what we'll return. However, we must be able to reach p1 when the user calls aligned_free. For this, note that we reserved location for one extra pointer when we calculated the offset (that's the sizeof(void*) in line 4.
- so, we want to put p1 immediately before p2. This is p2[-1]. This is a little bit tricky calculation. Remember that p2 is defined as void*. One way to look at it is as array of void. C array calculation say that p2[0] is exactly p2. p2[1] is p2 + sizeof(void*). In general p2[n] = p2 + n*sizeof(void*). The compiler also supports negative numbers, so p2[-1] is one void* (typically 4 bytes) before p2.
I'm going to guess that aligned_free is something like:
void aligned_free( void* p ) {
void* p1 = ((void**)p)[-1]; // get the pointer to the buffer we allocated
free( p1 );
}
p1 is the actual allocation. p2 is the pointer being returned, which references memory past the point of allocation and leaves enough space for both allignment AND storing the actual allocated pointer in the first place. when aligned_free() is called, p1 will be retrieved to do the "real" free().
Regarding the bit math, that gets a little more cumbersome, but it works.
p2=(void**)(((size_t)(p1)+offset)&~(alignment-1)); //line 5
Remember, p1 is the actual allocation reference. For kicks, lets assume the following, with 32bit pointers:
alignment = 64 bytes, 0x40
offset = 0x40-1+4 = 0x43
p1 = 0x20000110, a value returned from the stock malloc()
What is important is the original malloc()
that allocates an additional 0x43 bytes of space above and beyond the original request. This is to ensure both the alignment math and the space for the 32bit pointer can be accounted for:
p2=(void**)(((size_t)(p1)+offset)&~(alignment-1)); //line 5
p2 = (0x20000110 + 0x43) &~ (0x0000003F)
p2 = 0x20000153 &~ 0x0000003F
p2 = 0x20000153 & 0xFFFFFFC0
p2 = 0x20000140
p2 is aligned on a 0x40 boundary (i.e. all bits in 0x3F are 0) and enough space is left behind to store the 4-byte pointer for the original allocation, referenced by p1.
It has been forever since i did alignment math, so if i f'ed up the bits, please someone correct this.
And by the way, this is not the only way to do this.
void* align_malloc(size_t size, size_t alignment)
{
// sanity check for size/alignment.
// Make sure alignment is power of 2 (alignment&(alignment-1) ==0)
// allocate enough buffer to accommodate alignment and metadata info
// We want to store an offset to address where CRT reserved memory to avoid leak
size_t total = size+alignment+sizeof(size_t);
void* crtAlloc = malloc(total);
crtAlloc += sizeof(size_t); // make sure we have enough buffer ahead to store metadata info
size_t crtArithmetic = reinterprete_cast<int>(crtAlloc); // treat as int for pointer arithmetic
void* pRet = crtAlloc + (alignment - (crtArithmetic%alignment));
size_t *pMetadata = reinterprete_cast<size_t*>(pRet);
pMetadata[-1] = pRet - crtArithmetic- sizeof(size_t);
return pRet;
}
As an example size = 20, alignement = 16 and crt malloc returned address 10. and assuming sizeof(size_t) as 4 byte
- total bytes to allocate = 20+16+4 = 40
- memory committed by crt = address 10 to 50
- first make space in front by adding sizeof(size_t) 4 bytes so you point at 14
- add offset to align which is 14 + (16-14%16) = 16
- move back sizeof(size_t) 4 bytes (i.e. 12) and treat that as size_t pointer and store offset 2 (=12-10) to point to crt malloc
start
Same way, align_free will cast void pointer to size_t pointer, move back one location, read value stored there and adjust offset to move to crt alloc beginning
void* align_free(void* ptr)
{
size_t* pMetadata = reinterprete_cast<size_t*> (ptr);
free(ptr-pMetadata[-1]);
}
Optimization: You don't need sizeof(size_t) extra allocation if your alignment was more than sizeof(size_t)