Our application continuously allocates arrays for large quantities of data (say tens to hundreds of megabytes) which live for a shortish amount of time before being discarded.
Done naively this can cause large object heap fragmentation, eventually causing the application to crash with an OutOfMemoryException despite the size of the currently live objects not being excessive.
One way we have successfully managed this in the past is to chunk up the arrays to ensure they don't end up on the LOH, the idea being to avoid fragmentation by allowing memory to be compacted by the garbage collector.
Our latest application handles more data than before, and passes this serialized data very frequently between add-ins hosted in either separate AppDomains or separate processes. We adopted the same approach as before, ensuring our memory was always chunked and being very careful to avoid large object heap allocations.
However we have one add-in that must be hosted in an external 32bit process (because our main application is 64bit and the add-in must use a 32bit library). Under particularly heavy load, when a lot of SOH memory chunks are being quickly allocated and discarded shortly after, even our chunking approach hasn't been enough to save our 32bit add-in and it crashes with an OutOfMemoryException.
Using WinDbg at the moment when an OutOfMemoryException occurs, !heapstat -inclUnrooted
shows this:
Heap Gen0 Gen1 Gen2 LOH
Heap0 24612 4166452 228499692 9757136
Free space: Percentage
Heap0 12 12 4636044 12848SOH: 1% LOH: 0%
Unrooted objects: Percentage
Heap0 72 0 5488 0SOH: 0% LOH: 0%
!dumpheap -stat
show this:
-- SNIP --
79b56c28 3085 435356 System.Object[]
79b8ebd4 1 1048592 System.UInt16[]
79b9f9ac 26880 1301812 System.String
002f7a60 34 4648916 Free
79ba4944 6128 87366192 System.Byte[]
79b8ef28 17195 145981324 System.Double[]
Total 97166 objects
Fragmented blocks larger than 0.5 MB:
Addr Size Followed by
18c91000 3.7MB 19042c7c System.Threading.OverlappedData
These tell me that our memory usage isn't excessive, and our large object heap is very small as expected (so we're definitely not dealing with large object heap fragmentation here).
However, !eeheap -gc
shows this:
Number of GC Heaps: 1
generation 0 starts at 0x7452b504
generation 1 starts at 0x741321d0
generation 2 starts at 0x01f91000
ephemeral segment allocation context: none
segment begin allocated size
01f90000 01f91000 02c578d0 0xcc68d0(13396176)
3cb10000 3cb11000 3d5228b0 0xa118b0(10557616)
3ece0000 3ece1000 3fc2ef48 0xf4df48(16047944)
3db10000 3db11000 3e8fc8f8 0xdeb8f8(14596344)
42e20000 42e21000 4393e1f8 0xb1d1f8(11653624)
18c90000 18c91000 19c53210 0xfc2210(16523792)
14c90000 14c91000 15c85c78 0xff4c78(16731256)
15c90000 15c91000 168b2870 0xc21870(12720240)
16c90000 16c91000 17690744 0x9ff744(10483524)
5c0c0000 5c0c1000 5d05381c 0xf9281c(16328732)
69c80000 69c81000 6a88bc88 0xc0ac88(12627080)
6b2d0000 6b2d1000 6b83e8a0 0x56d8a0(5691552)
6c2d0000 6c2d1000 6d0f2608 0xe21608(14816776)
6d2d0000 6d2d1000 6defc67c 0xc2b67c(12760700)
6e2d0000 6e2d1000 6ee7f304 0xbae304(12247812)
70000000 70001000 70bfb41c 0xbfa41c(12559388)
71ca0000 71ca1000 72893440 0xbf2440(12526656)
73b40000 73b41000 74531528 0x9f0528(10421544)
Large object heap starts at 0x02f91000
segment begin allocated size
02f90000 02f91000 038df1d0 0x94e1d0(9757136)
Total Size: Size: 0xe737614 (242447892) bytes.
------------------------------
GC Heap Size: Size: 0xe737614 (242447892) bytes.
The thing that strikes me here is that our final SOH heap segment starts at 0x73b41000 which is right at the limit of our available memory in our 32bit add-in.
So if I'm reading that correctly, our problem seems to be that is our virtual memory has become fragmented with managed heap segments.
I guess my questions here would be:
- Is my analysis correct?
- Is our approach to avoiding LOH fragmentation using chunking reasonable?
- Is there a good strategy to avoid the memory fragmentation we now appear to be seeing?
The most obvious answer I can think of is to pool and re-use our memory chunks. This is potentially do-able, but is something I would rather avoid as it involves us effectively managing that part of our memory ourselves.
For those interested, here is an update of what I found out with regards to this problem:
It appeared that the best solution was to implement pooling of our chunks to relieve pressure on the garbage collector, so I did this.
The result was that the add-in got slightly further in its task, but unfortunately it still ran out of memory fairly quickly.
Looking in WinDbg again, the only real difference I could see was that our combined managed heap size was consistently smaller, at around 200MB compared to around 250MB before pooling.
It was almost as if the amount of memory available to .NET was decreasing over time, and so implementing the pooling had simply delayed running out of memory.
If this was true the obvious culprit was a COM component which we use to load the data into memory. We do some caching of COM objects to improve repeated access time to the data. I removed all the caching and ensured everything was released after every query of the data.
Now everything looks fine with regards to memory, it is just much slower (which I will have to solve next).
I guess in hindsight the COM component should have been the first suspect for the memory issues, but hey I learned something :) And on the plus side, the pooling will still be useful to decrease GC overhead, so that was worth doing as well.
Thanks for your comments everyone.