I am trying to optimize the memory allocation of my program by using entire pages at a time.
I am grabbing the page size like this: sysconf(_SC_PAGESIZE);
and am then calculating the total number of elements that will fit in a page like this: elements=pageSize/sizeof(Node);
My thinking was that when I actually go to malloc my memory I would use malloc(elements*sizeof(Node));
It seems like the multiplication and division of sifeof(Node) would cancel out, but with integer division, I do not believe that that is the case.
Is this the best way to malloc an entire page at a time?
Thanks
The malloc
function doesn't have any concept of pagesize. Unless you are allocating pages that are ALSO aligned to a page-boundary, you will not get ANY benefit from calling malloc
in this way. Just malloc
as many elements as you need, and stop worrying about micro-optimising something that almost certainly won't give you any benefit at all.
Yes, the Linux kernel does things like this all the time. There are two reasons for that:
- You don't want to allocate blocks LARGER than a page, since that significantly increases the risk of allocation failure.
- The kernel allocation is made on a per-page basis, rather than like the C library, which allocates a large amount of memory in one go, and then splits it into small components.
If you really want to allocate page-size amount of memory, then use the result from sysconf(_SC_PAGESIZE)
as your size argument. But it is almost certain that your allocation straddles two pages.
Your computation elements=pageSize/sizeof(Node);
doesn't take account of the malloc()
metadata that are added to any block/chunk of memory returned by malloc()
. In many cases, malloc()
will return a memory block likely aligned at least on min(sizeof(double),2 * sizeof(void *))
boundary (32 bytes is becoming quite common btw ...). If malloc()
gets a memory block aligned on a page, adds its chunk (with padding), and you write a full page size of data, the last bytes are off the first page: so you're ending up using 2 pages.
Want a whole page, just for you, without concerns about wasting memory, without using mmap()
/ VirtualAlloc()
as suggested in the comments ?
Here you are:
int ret;
void *ptr = NULL;
size_t page_size = sysconf(_SC_PAGESIZE);
ret = posix_memalign(&ptr, page_size, page_size);
if (ret != 0 || ptr == NULL) {
fprintf(stderr, "posix_memalign failed: %s\n", strerror(ret));
}
By the way, this is probably about micro-optimization.
You probably still haven't checked your Node
have a size multiple of a cache-line, nor how to improve cache-locality, nor found a way to reduce memory fragmentation. So you're probably going in the wrong way: make it works first, profil, optimize your algorithms, profil, micro-optimize at the last option.
The standards provide no guarantee that malloc
even has a concept of page size. However, it's not uncommon for malloc implementations to dole out entire pages when the allocation size requested is on the order of the page size (or larger).
There's certainly no harm in asking for an allocation that happens to be equal to the page size (or a multiple of the page size) and subdividing it yourself, though it is a little extra work. You might indeed get the behavior you desire, at least on some machines/compiler/library combinations. But you might not either. If you absolutely require page-sized allocations and/or page-aligned memory, you'll have to call an OS-specific API to get it.
If your question is about how to alloc whole memory pages: Use mmap()
, not malloc()
.
Reason:
malloc()
must always add some metadata to every allocation, so if you do malloc(4096)
it will definitely allocate more than a single page. mmap()
, on the other hand, is the kernel's API to map pages into your address space. It's what malloc()
uses under the hood.
If your question is about correct rounding: The usual trick to round a
up to a multiple of N
is to say rounded = (a + N-1)/N*N;
. By adding N-1
first, you ensure that the division will round up in all cases. In the case that a
is already a multiple of N
, the added N-1
will be without effect; in all other cases, you get one more than with rounded = a/N*N;
.
The C11 standard added the aligned_alloc call, so you can do something like:
#include <stdlib.h>
#include <unistd.h>
void *alloc_page( void )
{
long page_size = sysconf( _SC_PAGESIZE ); /* arguably could be a constant, #define, etc. */
return ( page_size > 0 ? aligned_alloc( page_size, page_size ) : NULL );
}
The problem with this approach, as others pointed out, is that typically the implementation of the standard alloc calls add some bookkeeping overhead that is stored just before the allocated memory. So, this allocation will usually straddle two pages: the returned page for you to use, and the very end of another page used by the allocator's bookkeeping.
That means when you free or realloc this memory, it may need to touch two pages rather than just the one. Also, if you allocate all or most of your memory this way, then you can "waste" a lot of virtual memory as roughly half of the pages allocated to your process at the OS level will only be used a tiny bit for the allocator's bookkeeping.
How important these issues are is hard to say generally, but preferably they would be avoided somehow. Unfortunately, I haven't figured out a clean, easy, and portable way to do that yet.
==============================
Addendum: If you could dynamically figure out malloc's memory overhead and assume it is always constant, then would asking for that much less usually give us what we want?
#include <stdlib.h>
#include <unistd.h>
/* decent default guesses (e.g. - Linux x64) */
static size_t Page_Size = 4096;
static size_t Malloc_Overhead = 32;
/* call once at beginning of program (i.e. - single thread, no allocs yet) */
int alloc_page_init( void )
{
int ret = -1;
long page_size = sysconf( _SC_PAGESIZE );
char *p1 = malloc( 1 );
char *p2 = malloc( 1 );
size_t malloc_overhead;
if ( page_size <= 0 || p1 == NULL || p2 == NULL )
goto FAIL;
malloc_overhead = ( size_t ) ( p2 > p1 ? p2 - p1 : p1 - p2 ); /* non-standard pointer math */
if ( malloc_overhead > 64 || malloc_overhead >= page_size )
goto FAIL;
Page_Size = page_size;
Malloc_Overhead = malloc_overhead;
ret = 0;
FAIL:
if ( p1 )
free( p1 );
if ( p2 )
free( p2 );
return ret;
}
void *alloc_page( void )
{
return aligned_alloc( Page_Size - Malloc_Overhead, Page_Size - Malloc_Overhead );
}
Answer: probably not, because, for example, "As an example of the "supported by the implementation" requirement, POSIX function posix_memalign accepts any alignment that is a power of two and a multiple of sizeof(void *), and POSIX-based implementations of aligned_alloc inherit these requirements."
The above code would likely not request an alignment that is a power of 2 and will therefore likely fail on most platforms.
It seems this is an unavoidable problem with the typical implementations of standard allocation functions. So, it is probably best to just align and alloc based on the page size and likely pay the penalty of the allocator's bookkeeping residing on another page, or use an OS specific call like mmap to avoid this issue.