In my job we had a problem with OutOfMemoryExceptions. I've written a simple piece of code to mimic some behavior, and I've ended up with the following mystery. Look at this simple code which blows up when it runs out of memory.
class Program
{
private static void Main()
{
List<byte[]> list = new List<byte[]>(200000);
int iter = 0;
try
{
for (;;iter++)
{
list.Add(new byte[10000]);
}
}
catch (OutOfMemoryException)
{
Console.WriteLine("Iterations: " + iter);
}
}
}
On my machine it ended up with
Iterations: 148008
Then I added a GC.Collect
call to the loop after each thousand iterations:
//...
for (;;iter++)
{
list.Add(new byte[10000]);
if (iter % 1000 == 0)
GC.Collect();
}
//...
And surprise:
Iterations: 172048
When I called GC.Collect
after each 10 iterations, I even got 193716 cycles. There are two strange things:
How can a manual call to
GC.Collect
have such a severe impact (up to 30% more allocated)?What the hell can GC collect, when there're no "lost" references (I've even preset the List's capacity)?
I had similar problem in .NET with the difference my byte[] had random sizes.
i tried two ways:
write an own heap manager (alloc memory with one large buffer and just adjust pointers)
use a memory mapped file (in my opinion the better solution)
If possible you can try out .NET 4.5 http://blogs.msdn.com/b/dotnet/archive/2012/07/20/the-net-framework-4-5-includes-new-garbage-collector-enhancements-for-client-and-server-apps.aspx
A part of the garbage collection process is the compacting phase. During this phase, blocks of allocated memory are moved around to reduce fragementation. When memory is allocated, it isn't always allocated right after the last chunk of allocated memory left off. So you are able to squeeze a bit more in because the garbage collector is making more room by making better use of the available space.
I am trying to run some tests, but my machine can't handle them. Give this a try, it will tell the
GC
to pin down the objects in memory so they aren't moved aroundAs for your comment, when the
GC
moves things around, it isn't wiping anything out, it is just making better use of all memory space. Lets try and over simplify this. When you allocate your byte array the first time, lets say it gets inserted in memory from spot 0 to 10000. The next time you allocate the byte array, it isn't guarenteed to start at 10001, it may start at 10500. So now you have 499 bytes that aren't being used, and won't be used by your application. So when theGC
does compacting, it will move the 10500 array to 10001 to be able to use that extra 499 bytes. And again, this is way over simplified.Depending on the CLR you're using, there may be some Large Object Heap issues involved.
Have a look at this article, which explains the issues with large block allocations (and the list with 200000 items is a large block for sure, the other may or may not be, some arrays seem to be put into LOH when they reach 8k, others after 85k).
http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
The CLR occasionally places arrays on the LOH. If you ever look at a memory dump thru WinDbg, you will see that there are arrays that are under 85,000 bytes. It is undocumented behavior - but that's just the way it works.
You are getting the OutOfMemoryErrors because you are fragmenting the LOH Heap and the LOH Heap is never compacted.
With regard to your question of:
There are overwritten references to the
new byte[10000]
that you pass to add to the list. A local variable is compiled and assigned to thenew byte[10000]
. For every iteration in the loop you create a new byte[] with a pre-defined size of 10000 and it is assigned to the local variable. Any previous value for the variable is overwritten and that memory is eligible for collection the next time the GC runs for the generation the variable lives in (in this case, possibly the LOH).