I am trying to optimize the memory allocation of my program by using entire pages at a time.
I am grabbing the page size like this: sysconf(_SC_PAGESIZE);
and am then calculating the total number of elements that will fit in a page like this: elements=pageSize/sizeof(Node);
My thinking was that when I actually go to malloc my memory I would use malloc(elements*sizeof(Node));
It seems like the multiplication and division of sifeof(Node) would cancel out, but with integer division, I do not believe that that is the case.
Is this the best way to malloc an entire page at a time?
Thanks
The standards provide no guarantee that
malloc
even has a concept of page size. However, it's not uncommon for malloc implementations to dole out entire pages when the allocation size requested is on the order of the page size (or larger).There's certainly no harm in asking for an allocation that happens to be equal to the page size (or a multiple of the page size) and subdividing it yourself, though it is a little extra work. You might indeed get the behavior you desire, at least on some machines/compiler/library combinations. But you might not either. If you absolutely require page-sized allocations and/or page-aligned memory, you'll have to call an OS-specific API to get it.
Your computation
elements=pageSize/sizeof(Node);
doesn't take account of themalloc()
metadata that are added to any block/chunk of memory returned bymalloc()
. In many cases,malloc()
will return a memory block likely aligned at least onmin(sizeof(double),2 * sizeof(void *))
boundary (32 bytes is becoming quite common btw ...). Ifmalloc()
gets a memory block aligned on a page, adds its chunk (with padding), and you write a full page size of data, the last bytes are off the first page: so you're ending up using 2 pages.Want a whole page, just for you, without concerns about wasting memory, without using
mmap()
/VirtualAlloc()
as suggested in the comments ? Here you are:By the way, this is probably about micro-optimization. You probably still haven't checked your
Node
have a size multiple of a cache-line, nor how to improve cache-locality, nor found a way to reduce memory fragmentation. So you're probably going in the wrong way: make it works first, profil, optimize your algorithms, profil, micro-optimize at the last option.If your question is about how to alloc whole memory pages: Use
mmap()
, notmalloc()
.Reason:
malloc()
must always add some metadata to every allocation, so if you domalloc(4096)
it will definitely allocate more than a single page.mmap()
, on the other hand, is the kernel's API to map pages into your address space. It's whatmalloc()
uses under the hood.If your question is about correct rounding: The usual trick to round
a
up to a multiple ofN
is to sayrounded = (a + N-1)/N*N;
. By addingN-1
first, you ensure that the division will round up in all cases. In the case thata
is already a multiple ofN
, the addedN-1
will be without effect; in all other cases, you get one more than withrounded = a/N*N;
.The C11 standard added the aligned_alloc call, so you can do something like:
The problem with this approach, as others pointed out, is that typically the implementation of the standard alloc calls add some bookkeeping overhead that is stored just before the allocated memory. So, this allocation will usually straddle two pages: the returned page for you to use, and the very end of another page used by the allocator's bookkeeping.
That means when you free or realloc this memory, it may need to touch two pages rather than just the one. Also, if you allocate all or most of your memory this way, then you can "waste" a lot of virtual memory as roughly half of the pages allocated to your process at the OS level will only be used a tiny bit for the allocator's bookkeeping.
How important these issues are is hard to say generally, but preferably they would be avoided somehow. Unfortunately, I haven't figured out a clean, easy, and portable way to do that yet.
==============================
Addendum: If you could dynamically figure out malloc's memory overhead and assume it is always constant, then would asking for that much less usually give us what we want?
Answer: probably not, because, for example, "As an example of the "supported by the implementation" requirement, POSIX function posix_memalign accepts any alignment that is a power of two and a multiple of sizeof(void *), and POSIX-based implementations of aligned_alloc inherit these requirements."
The above code would likely not request an alignment that is a power of 2 and will therefore likely fail on most platforms.
It seems this is an unavoidable problem with the typical implementations of standard allocation functions. So, it is probably best to just align and alloc based on the page size and likely pay the penalty of the allocator's bookkeeping residing on another page, or use an OS specific call like mmap to avoid this issue.
The
malloc
function doesn't have any concept of pagesize. Unless you are allocating pages that are ALSO aligned to a page-boundary, you will not get ANY benefit from callingmalloc
in this way. Justmalloc
as many elements as you need, and stop worrying about micro-optimising something that almost certainly won't give you any benefit at all.Yes, the Linux kernel does things like this all the time. There are two reasons for that:
If you really want to allocate page-size amount of memory, then use the result from
sysconf(_SC_PAGESIZE)
as your size argument. But it is almost certain that your allocation straddles two pages.