可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am currently looking into malloc() implementation under Windows. But in my research I have stumbled upon things that puzzled me:

First, I know that at the API level, windows uses mostly the HeapAlloc() and VirtualAlloc() calls to allocate memory. I gather from here that the Microsoft implementation of malloc() (that which is included in the CRT - the C runtime) basically calls HeapAlloc() for blocks > 480 bytes and otherwise manage a special area allocated with VirtualAlloc() for small allocations, in order to prevent fragmentation.

Well that is all good and well. But then there are other implementation of malloc(), for instance nedmalloc, which claim to be up to 125% faster than Microsoft's malloc.

All this makes me wonder a few things:

Why can't we just call HeapAlloc() for small blocks? Does is perform poorly in regard to fragmentation (for example by doing "first-fit" instead of "best-fit")?
- Actually, is there any way to know what is going under the hood of the various API allocation calls? That would be quite helpful.
What makes nedmalloc so much faster than Microsoft's malloc?
From the above, I got the impression that HeapAlloc()/VirtualAlloc() are so slow that it is much faster for malloc() to call them only once in a while and then to manage the allocated memory itself. Is that assumption true? Or is the malloc() "wrapper" just needed because of fragmentation? One would think that system calls like this would be quick - or at least that some thoughts would have been put into them to make them efficient.
- If it is true, why is it so?
On average, how many (an order of magnitude) memory reads/write are performed by a typical malloc call (probably a function of the number of already allocated segments)? I would intuitively says it's in the tens for an average program, am I right?

回答1:

Calling HeapAlloc doesn't sound cross-platform. MS is free to change their implementation when they wish; advise to stay away. :)
It is probably using memory pools more effectively, much like the Loki library does with its "small object allocator"
Heap allocations, which are general purpose by nature, are always slow via any implementation. The more "specialized" the allocator, the faster it will be. This returns us to point #2, which deals with memory pools (and the allocation sizes used that are specific to your application).
Don't know.

回答2:

From the above, I got the impression that HeapAlloc()/VirtualAlloc() are so slow that it is much faster for malloc() to call them only once in a while and then to manage the allocated memory itself. Is that assumption true?

The OS-level system calls are designed and optimized for managing the entire memory space of processes. Using them to allocate 4 bytes for an integer is indeed suboptimal - you get overall better performance and memory usage by managing tiny allocations in library code, and letting the OS optimize for larger allocations. At least as far as I understand it.