I'm seeing the performance counter "# Induced GC" (which should stay at zero in a perfect app) increasing rapidly when processing small files (<= 32x32) via WriteableBitmap
.
While this isn't a significant bottleneck inside a small app, it becomes a very huge problem (app freezing at 99.75% "% Time in GC" for several seconds at each step) when there exist some thousand objects in memory (ex: EntityFramework
context loaded with many entities and relationships).
Synthetic test:
var objectCountPressure = (
from x in Enumerable.Range(65, 26)
let root = new DirectoryInfo((char)x + ":\\")
let subs =
from y in Enumerable.Range(0, 100 * IntPtr.Size)
let sub =new {DI = new DirectoryInfo(Path.Combine(root.FullName, "sub" + y)), Parent = root}
let files = from z in Enumerable.Range(0, 400) select new {FI = new FileInfo(Path.Combine(sub.DI.FullName, "file" + z)), Parent = sub}
select new {sub, files = files.ToList()}
select new {root, subs = subs.ToList()}
).ToList();
const int Size = 32;
Action<int> handler = threadnr => {
Console.WriteLine(threadnr + " => " + Thread.CurrentThread.ManagedThreadId);
for (int i = 0; i < 10000; i++) {
var wb = new WriteableBitmap(Size, Size, 96, 96, PixelFormats.Bgra32, null);
wb.Lock();
var stride = wb.BackBufferStride;
var blocks = stride / sizeof(int);
unsafe {
var row = (byte*)wb.BackBuffer;
for (int y = 0; y < wb.PixelHeight; y++, row += stride)
{
var start = (int*)row;
for (int x = 0; x < blocks; x++, start++)
*start = i;
}
}
wb.Unlock();
wb.Freeze(); }
};
var sw = Stopwatch.StartNew();
Console.WriteLine("start: {0:n3} ms", sw.Elapsed.TotalMilliseconds);
Parallel.For(0, Environment.ProcessorCount, new ParallelOptions{MaxDegreeOfParallelism = Environment.ProcessorCount}, handler);
Console.WriteLine("stop : {0:n2} s", sw.Elapsed.TotalSeconds);
GC.KeepAlive(objectCountPressure);
I can run this test using "const int Size = 48
" a dozen times: It always returns in ~1.5s and "# Induced GC" sometimes increases by 1 or 2.
When I change "const int Size = 48
" into "const int Size = 32
" then something very very bad is happening: "# Induced GC" increases by 10 per second and the overall runtime now is more than a minute: ~80s !
[Tested on Win7x64 Core-i7-2600 with 8GB RAM // .NET 4.0.30319.237 ]
WTF!?
Either the Framework has a very bad bug or I'm doing something entirely wrong.
BTW:
I came around this problem not by doing image processing but by just using a Tooltip containing an Image against some database entities via a DataTemplate:
This worked fine (fast) while there didn't exist very much objects in RAM -- but when there existed some million other objects (totally unrelated) then showing the Tooltip always delayed for several seconds, while everything else just was working fine.
Try this simple workaround:
Call
GC.AddMemoryPressure(128 * 1024)
once, this will numb the memory pressure mechanism.If it's not numb enough, give a bigger number.
TL;DR: Probably the best solution would be to create a small pool of
WriteableBitmaps
and reuse them rather than creating them and throwing them away.So I started spelunking around with WinDbg to see what was causing the collections to happen.
First I added a call to
Debugger.Break()
to the start ofMain
to make things easier. I also added my own call toGC.Collect()
as a sanity check to make sure my breakpoint worked ok. Then in WinDbg:So the breakpoint worked OK, but when I let the program continue it was never hit again. It seemed the GC routine was being called from somewhere deeper. Next I stepped into the
GC.Collect()
function to see what it was calling. To do this more easily I added a second call toGC.Collect()
immediately after the first and stepped into the second one. This avoided stepping through all the JIT compilation:After a little stepping I noticed a reference to
clr!GCInterface::Collect
which sounded promising. Unfortunately a breakpoint on it never triggered. Digging further intoGC.Collect()
I foundclr!WKS::GCHeap::GarbageCollect
which proved to be the real method. A breakpoint on this revealed the code that was triggering the collection:So
WriteableBitmap
's constructor indirectly calls GC.AddMemoryPressure, which eventually results in collections (incidentally,GC.AddMemoryPressure
is an easier way to simulate memory usage). This doesn't explain the sudden change in behaviour when going from a size of 33 to 32 though.ILSpy helps out here. In particular, if you look at the constructor for
SafeMILHandleMemoryPressure
(invoked bySafeMILHandle.UpdateEstimatedSize
) you'll see that it only usesGC.AddMemoryPressure
if the pressure to add is <= 8192. Otherwise it uses its own custom system for tracking memory pressure and triggering collections. A bitmap size of 32x32 with 32-bit pixels falls under this limit becauseWriteableBitmap
estimates the memory use as 32 * 32 * 4 * 2 (I'm not sure why the extra factor of 2 is there).In summary, it looks like the behaviour you're seeing is the result of a heuristic in the framework that doesn't work so well for your case.
You might be able to work around it by creating a bitmap with bigger dimensions or a bigger pixel format than you need so that the estimated memory size of the bitmap is > 8192.Afterthought: I guess this also suggests that collections triggered as a result of
GC.AddMemoryPressure
are counted under "# Induced GC"?Under all of the
SafeMILHandleMemoryPressure
andSafeMILHandle
nonsense is a call to a method onMS.Internal.MemoryPressure
, which uses the static field "_totalMemory
" to keep track of how much memory WPF thinks is allocated. When it hits a (rather small) limit, the induced GCs start and never end.You can stop WPF from behaving this way entirely using a little reflection magic; just set
_totalMemory
to something appropriately negative so the limit is never reached and the induced GCs never happen:Running Markus' code on Win7 x86 (T4300, 2.1GHz, 3GB):
(notice the huge difference between 33 and 32)
Is64BitOperatingSystem: False
Is64BitProcess: False
Version: 4.0.30319.237
Running Test with 40: 3,20 s
Running Test with 34: 1,14 s
Running Test with 33: 1,06 s
Running Test with 32: 64,41 s
Running Test with 30: 53,32 s
Running Test with 24: 29,01 s
Another machine Win7 x64 (Q9550, 2.8GHz, 8GB):
Is64BitOperatingSystem: True
Is64BitProcess: False
Version: 4.0.30319.237
Running Test with 40: 1,41 s
Running Test with 34: 1,24 s
Running Test with 33: 1,19 s
Running Test with 32: 1.554,45 s
Running Test with 30: 1.489,31 s
Running Test with 24: 842,66 s
Once again with 40: 7,21 s
The Q9550 CPU has a lot more power than the T4300, but it's running on 64bit OS.
This seems to slow down the whole thing.