As the title states, I have a problem with high page file activity.
I am developing a program that process a lot of images, which it loads from the hard drive. From every image it generates some data, that I save on a list. For every 3600 images, I save the list to the hard drive, its size is about 5 to 10 MB. It is running as fast as it can, so it max out one CPU Thread.
The program works, it generates the data that it is supposed to, but when I analyze it in Visual Studio I get a warning saying: DA0014: Extremely high rates of paging active memory to disk.
The memory comsumption of the program, according to Task Manager is about 50 MB and seems to be stable. When I ran the program I had about 2 GB left out of 4 GB, so I guess I am not running out of RAM. http://i.stack.imgur.com/TDAB0.png
The DA0014 rule description says "The number of Pages Output/sec is frequently much larger than the number of Page Writes/sec, for example. Because Pages Output/sec also includes changed data pages from the system file cache. However, it is not always easy to determine which process is directly responsible for the paging or why."
Does this mean that I get this warning simply because I read a lot of images from the hard drive, or is it something else? Not really sure what kind of bug I am looking for.
EDIT: Link to image inserted.
EDIT1: The images size is about 300 KB each. I dipose each one before loading the next.
UPDATE: Looks from experiments like the paging comes from just loading the large amount of files. As I am no expert in C# or the underlying GDI+ API, I don't know which of the answers are most correct. I chose Andras Zoltans answer as it was well explained and because it seems he did a lot of work to explain the reason to a newcomer like me:)
Updated following more info
The working set of your application might not be very big - but what about the virtual memory size? Paging can occur because of this and not just because of it's physical size. See this screen shot from Process Explorer of VS2012 running on Windows 8:
And on task manager? Apparently the private working set for the same process is 305,376Kb.
We can take from this a) that Task Manager can't necessarily be trusted and b) an application's size in memory, as far as the OS is concerned, is far more complicated than we'd like to think.
You might want to take a look at this.
The paging is almost certainly because of what you do with the files and the high final figures almost certainly because of the number of files you're working with. A simple test of that would be experiment with different numbers of files and generate a dataset of final paging figures alongside those. If the number of files is causing the paging, then you'll see a clear correlation.
Then take out any processing (but keep the image-loading) you do and compare again - note the difference.
Then stub out the image-loading code completely - note the difference.
Clearly you'll see the biggest drop in faults when you take out the image loading.
Now, looking at the Emgu.CV Image code, it uses the
Image
class internally to get the image bits - so that's firing up GDI+ via the function GdipLoadImageFromFile (Second entry on this index)) to decode the image (using system resources, plus potentially large byte arrays) - and then it copies the data to an uncompressed byte array containing the actual RGB values.This byte array is allocated using
GCHandle.Alloc
(also surrounded byGC.AddMemoryPressure
andGC.RemoveMemoryPressure
) to create a pinned byte array to hold the image data (uncompressed). Now I'm no expert on .Net memory management, but it seems to me that what we have a potential for heap fragmentation here, even if each file is loaded sequentially and not in parallel.Whether that's causing the hard paging I don't know. But it seems likely.
In particular the in-memory representation of the image could be specifically geared around displaying as opposed to being the original file bytes. So if we're talking JPEGs, for example, then a 300Kb JPEG could be considerably larger in physical memory, depending on its size. E.g. a 1027x768 32 bit image is 3Mb - and that's been allocated twice for each image since it's loaded (first allocation) then copied (second allocation) into the EMGU image object before being disposed.
But you have to ask yourself if it's necessary to find a way around the problem. If your application is not consuming vast amounts of physical RAM, then it will have much less of an impact on other applications; one process hitting the page file lots and lots won't badly affect another process that doesn't, if there's sufficient physical memory.
The devil is in that cop-out note. Bitmaps are mapped into memory from the file that contains the pixel data using a memory-mapped file. That's an efficient way to avoid reading and writing the data directly into/from RAM, you only pay for what you use. The mechanism that keeps the file in sync with RAM is paging. So it is inevitable that if you process a lot of images then you'll see a lot of page faults. The tool you use just isn't smart enough to know that this is by design.
Feature, not a bug.