Is it better to allocate memory in the power of tw

2019-01-17 02:56发布

问题:

When we use malloc() to allocate memory, should we give the size which is in power of two? Or we just give the exact size that we need?
Like

//char *ptr= malloc( 200 ); 
char *ptr= malloc( 256 );//instead of 200 we use 256

If it is better to give size which is in the power of two, what is the reason for that? Why is it better?

Thanks

Edit

The reason of my confusion is following quote from Joel's blog Back to Basics

Smart programmers minimize the potential distruption of malloc by always allocating blocks of memory that are powers of 2 in size. You know, 4 bytes, 8 bytes, 16 bytes, 18446744073709551616 bytes, etc. For reasons that should be intuitive to anyone who plays with Lego, this minimizes the amount of weird fragmentation that goes on in the free chain. Although it may seem like this wastes space, it is also easy to see how it never wastes more than 50% of the space. So your program uses no more than twice as much memory as it needs to, which is not that big a deal.

Sorry, I should have posted the above quote earlier. My apologies!

Most replies, so far, say that allocating memory in the power of two is a bad idea, then in which scenario its better to follow Joel's point about malloc()? Why did he say that? Is the above quoted suggestion obsolete now?

Kindly explain it.
Thanks

回答1:

Just give the exact size you need. The only reason that a power-of-two size might be "better" is to allow quicker allocation and/or to avoid memory fragmentation.

However, any non-trivial malloc implementation that concerns itself with being efficient will internally round allocations up in this way if and when it is appropriate to do so. You don't need to concern yourself with "helping" malloc; malloc can do just fine on its own.

Edit:

In response to your quote of the Joel on Software article, Joel's point in that section (which is hard to correctly discern without the context that follows the paragraph that you quoted) is that if you are expecting to frequently re-allocate a buffer, it's better to do so multiplicatively, rather than additively. This is, in fact, exactly what the std::string and std::vector classes in C++ (among others) do.

The reason that this is an improvement is not because you are helping out malloc by providing convenient numbers, but because memory allocation is an expensive operation, and you are trying to minimize the number of times you do it. Joel is presenting a concrete example of the idea of a time-space tradeoff. He's arguing that, in many cases where the amount of memory needed changes dynamically, it's better to waste some space (by allocating up to twice as much as you need at each expansion) in order to save the time that would be required to repeatedly tack on exactly n bytes of memory, every time you need n more bytes.

The multiplier doesn't have to be two: you could allocate up to three times as much space as you need and end up with allocations in powers of three, or allocate up to fifty-seven times as much space as you need and end up with allocations in powers of fifty-seven. The more over-allocation you do, the less frequently you will need to re-allocate, but the more memory you will waste. Allocating in powers of two, which uses at most twice as much memory as needed, just happens to be a good starting-point tradeoff until and unless you have a better idea of exactly what your needs are.

He does mention in passing that this helps reduce "fragmentation in the free chain", but the reason for that is more because of the number and uniformity of allocations being done, rather than their exact size. For one thing, the more times you allocate and deallocate memory, the more likely you are to fragment the heap, no matter in what size you're allocating. Secondly, if you have multiple buffers that you are dynamically resizing using the same multiplicative resizing algorithm, then it's likely that if one resizes from 32 to 64, and another resizes from 16 to 32, then the second's reallocation can fit right where the first one used to be. This wouldn't be the case if one resized from 25 to 60 and and the other from 16 to 26.

And again, none of what he's talking about applies if you're going to be doing the allocation step only once.



回答2:

Just to play devil's advocate, here's how Qt does it:

Let's assume that we append 15000 characters to the QString string. Then the following 18 reallocations (out of a possible 15000) occur when QString runs out of space: 4, 8, 12, 16, 20, 52, 116, 244, 500, 1012, 2036, 4084, 6132, 8180, 10228, 12276, 14324, 16372. At the end, the QString has 16372 Unicode characters allocated, 15000 of which are occupied.

The values above may seem a bit strange, but here are the guiding principles:

QString allocates 4 characters at a time until it reaches size 20. From 20 to 4084, it advances by doubling the size each time. More precisely, it advances to the next power of two, minus 12. (Some memory allocators perform worst when requested exact powers of two, because they use a few bytes per block for book-keeping.) From 4084 on, it advances by blocks of 2048 characters (4096 bytes). This makes sense because modern operating systems don't copy the entire data when reallocating a buffer; the physical memory pages are simply reordered, and only the data on the first and last pages actually needs to be copied.

I like the way they anticipate operating system features in code that is meant to perform well from smartphones to server farms. Given that they're smarter people than me, I'd assume that said feature is available in all modern OSes.



回答3:

It might have been true once, but it's certainly not better.

Just allocate the memory you need, when you need it and free it up as soon as you've finished.

There are far too many programs that are profligate with resources - don't make yours one of them.



回答4:

It's somewhat irrelevant.

Malloc actually allocates slightly more memory than you request, because it has it's own headers to deal with. Therefore the optimal storage is probably something like 4k-12 bytes... but that varies depending on the implementation.

In any case, there is no reason for you to round up to more storage than you need as an optimization technique.



回答5:

You may want to allocate memory in terms of the processor's word size; not any old power of 2 will do.

If the processor has a 32-bit word (4 bytes), then allocate in units of 4 bytes. Allocating in terms of 2 bytes may not be helpful since the processor prefers data to start on a 4 byte boundary.

On the other hand, this may be a micro-optimization. Most memory allocation libraries are set up to return memory that is aligned at the correct position and will leave the least amount of fragmentation. If you allocate 15 bytes, the library may pad out and allocate 16 bytes. Some memory allocators have different pools based on the allocation size.

In summary, allocate the amount of memory that you need. Let the allocation library / manager handle the actual amount for you. Put more energy into correctness and robustness than worry about these trivial issues.



回答6:

When I'm allocating a buffer that may need to keep growing to accommodate as-yet-unknown-size data, I start with a power of 2 minus 1, and every time it runs out of space, I realloc with twice the previous size plus 1. This makes it so I never have to worry about integer overflows; the size can only overflow when the previous size was SIZE_MAX, at which point the allocation would already have failed, and 2*SIZE_MAX+1 == SIZE_MAX anyway.

In contrast, if I just used a power of 2 and doubled it each time, I might successfully get a 2^31 byte buffer and then reallocate to a 0 byte buffer next time I doubled the size.

As some people have commented about power-of-2-minus-12 being good for certain malloc implementations, one could equally start with a power of 2 minus 12, then double it and add 12 at each step...

On the other hand if you're just allocating small buffers that won't need to grow, request exactly the size you need. Don't try to second-guess what's good for malloc.



回答7:

This is totally dependent on the given libc implementation of malloc(3). It's up to that implementation to reserve heap chunks in whatever order it sees fit.

To answer the question - no, it's not "better" (here by "better" you mean ...?). If the size you ask for is too small, malloc(3) will reserve bigger chunk internally, so just stick with your exact size.



回答8:

With today's amount of memory and its speed I don't think it's relevant anymore.

Furthermore, if you're gonna allocate memory frequently you better consider custom memory pooling / pre-allocation.



回答9:

There is always testing...

You can try a "sample" program that allocates memory in a loop. This way you can see if your compiler magically allocates memory in powers of 2. With that information, you can try to allocate the same amount of total memory using the 2 strategies: random sized blocks and power of 2 sized blocks.

I would only expect differences, if any, for large amounts of memory though.



回答10:

If you're allocating some sort of expandable buffer where you need to pick some number for initial allocations, then yes, powers of 2 are good numbers to choose. If you need to allocate memory for struct foo, then just malloc(sizeof(struct foo)). The recommendation for power-of-2 allocations stems from the inefficiency of internal fragmentation, but modern malloc implementations intended for multiprocessor systems are starting to use CPU-local pools for allocations small enough for this to matter, which prevents the lock contention that used to result when multiple threads would attempt to malloc at the same time, and spend more time blocked due to fragmentation.

By allocating only what you need, you ensure that data structures are packed more densely in memory, which improves cache hit rate, which has a much bigger impact on performance than internal fragmentation. There exist scenarios with very old malloc implementations and very high-end multiprocessor systems where explicitly padding allocations can provide a speedup, but your resources in that case would be better spent getting a better malloc implementation up and running on that system. Pre-padding also makes your code less portable, and prevents the user or the system selecting the malloc behavior at run-time, either programmatically or with environment variables.

Premature optimization is the root of all evil.



回答11:

You should use realloc() instead of malloc() when reallocating. http://www.cplusplus.com/reference/clibrary/cstdlib/realloc/

Always use a power of two? It depends on what your program is doing. If you need to reprocess your whole data structure when it grows to a power of two, yeah it makes sense. Otherwise, just allocate what you need and don't hog memory.