Reason for ~100x slowdown with heap memory functio

2020-05-19 07:28发布

I'm trying to tracedown a huge slowdown in the heap memory functions in Windows Vista and Windows 7 (I didn't test on any server editions). It doesn't happen on Windows XP at all, only on Microsoft's newer operating systems.

I originally ran into this problem with PHP complied on Windows. The scripts themselves seemed to run at the expected speed, but after script execution I was experiencing 1-2 seconds of delay in the internal PHP shutdown functions. After firing up the debugging I saw that it had to do with the PHP memory manager's use of HeapAlloc/HeapFree/HeapReAlloc.

I traced it down to the use of the flag HEAP_NO_SERIALIZE on the heap functions:

#ifdef ZEND_WIN32
#define ZEND_DO_MALLOC(size) (AG(memory_heap) ? HeapAlloc(AG(memory_heap), HEAP_NO_SERIALIZE, size) : malloc(size))
#define ZEND_DO_FREE(ptr) (AG(memory_heap) ? HeapFree(AG(memory_heap), HEAP_NO_SERIALIZE, ptr) : free(ptr))
#define ZEND_DO_REALLOC(ptr, size) (AG(memory_heap) ? HeapReAlloc(AG(memory_heap), HEAP_NO_SERIALIZE, ptr, size) : realloc(ptr, size))
#else
#define ZEND_DO_MALLOC(size) malloc(size)
#define ZEND_DO_FREE(ptr) free(ptr)
#define ZEND_DO_REALLOC(ptr, size) realloc(ptr, size)
#endif

and (which actually sets the default for HeapAlloc/HeapFree/HeapReAlloc) in the function start_memory_manager:

#ifdef ZEND_WIN32
    AG(memory_heap) = HeapCreate(HEAP_NO_SERIALIZE, 256*1024, 0);
#endif

I removed the HEAP_NO_SERIALIZE parameter (replaced with 0), and it fixed the problem. Scripts now cleanup quickly in both the CLI and the SAPI Apache 2 version. This was for PHP 4.4.9, but the PHP 5 and 6 source code (in development) contains the same flag on the calls.

I'm not sure if what I did was dangerous or not. It's all a part of the PHP memory manager, so I'm going to have to do some digging and research, but this brings up the question:

Why are the heap memory function so slow on Windows Vista and Windows 7 with HEAP_NO_SERIALIZE?

While researching this problem I came up with exactly one good hit. Please read the blog post http://www.brainfarter.net/?p=69 where the poster explains the issue and offers a test case (both source and binaries available) to highlight the issue.

My tests on a Windows 7 x64 quad core 8 GB machine gives 43,836. Ouch! The same results without the HEAP_NO_SERIALIZE flag is 655, ~70x faster in my case.

Lastly, it seems that any program created with Visual C++ 6 using malloc/free or new/delete seems to be affected on these newer platforms. The Visual C++ 2008 compiler doesn't set this flag by default for those functions/operators so they aren't affected -- but that still leaves a lot of programs affected!

I encourage you to download the proof of concept and give this a try. This problem explained why my normal PHP on Windows installation was crawling and may explain why Windows Vista and Windows 7 seems slower at times.

UPDATE 2010-01-26: I received a response from Microsoft stating that the low-fragmentation heap (LFH) is the de facto default policy for heaps that hold any appreciable number of allocations. In Windows Vista, they reorganized a lot of code to remove extra data structures and code paths that were no longer part of the common case for handling heap API calls. With the HEAP_NO_SERIALIZE flag and in certain debugging situations, they do not allow the use of the LFH and we get stuck on the slow and less optimized path through the heap manager. So... it's highly recommended to not use HEAP_NO_SERIALIZE since you'll miss out on all the work to the LFH and any future work in the Windows heap API.

1条回答
\"骚年 ilove
2楼-- · 2020-05-19 08:10

The first difference I noticed is that Windows Vista always uses the Low Fragmentation Heap (LFH). Windows XP does not seem to. RtlFreeHeap in Windows Vista is a lot shorter as a result -- all the work is delegated to RtlpLowFragHeapFree. More information regarding LFH and its presence in various OSs. Note the red warning at the top.

More information (remarks section):

Windows XP, Windows Server 2003, and Windows 2000 with hotfix KB 816542:

A look-aside list is a fast memory allocation mechanism that contains only fixed-sized blocks. Look-aside lists are enabled by default for heaps that support them. Starting with Windows Vista, look-aside lists are not used and the LFH is enabled by default.

Another important piece of information: LFH and NO_SERIALIZE are mutually-exclusive (both cannot be active simultaneously). Combined with

Starting with Windows Vista, look-aside lists are not used

This implies that setting NO_SERIALIZE in Windows Vista disables LFH, but it does not (and cannot) fall back to standard look-aside lists (as a fast replacement), according to the above quote. I'm unclear as to what heap allocation strategy Windows Vista uses when NO_SERIALIZE is specified. It looks like it's using something horribly naïve, based on its performance.

Even more information:

Looking at a few stack snapshots of allocspeed.exe, it seems to always be in a Ready state (not Running or Wait), and in TryEnterCriticalSection from HeapFree, and pegging the CPU at nearly 100% load for 40 seconds. (On Windows Vista.)

Sample snapshot:

ntdll.dll!RtlInterlockedPushEntrySList+0xe8
ntdll.dll!RtlTryEnterCriticalSection+0x33b
kernel32.dll!HeapFree+0x14
allocspeed.EXE+0x11ad
allocspeed.EXE+0x1e15
kernel32.dll!BaseThreadInitThunk+0x12
ntdll.dll!LdrInitializeThunk+0x4d

Which is strange, because NO_SERIALIZE precisely tells it to skip lock acquisition. Something doesn't add up.

This is a question only Raymond Chen or Mark Russinovich could answer :)

查看更多
登录 后发表回答