I'd like to optimize this piece of code :
public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{
for (int x = 0; x < Width; x++)
{
for (int y = 0; y < Height; y++)
{
Byte pixelValue = image.GetPixel(x, y).B;
this.sumOfPixelValues[x, y] += pixelValue;
this.sumOfPixelValuesSquared[x, y] += pixelValue * pixelValue;
}
}
}
This is to be used for image processing, and we're currently running this for about 200 images. We've optimized the GetPixel value to use unsafe code, and we're not using image.Width, or image.Height, as those properties were adding to our runtime costs.
However, we're still stuck at a low speed. The problem is that our images are 640x480, so the middle of the loop is being called about 640x480x200 times. I'd like to ask if there's a way to speed it up somehow, or convince me that it's fast enough as it is. Perhaps a way is through some fast Matrix Addition, or is Matrix Addition inherently an n^2 operation with no way to speed it up?
Perhaps doing array accesses via unsafe code would speed it up, but I'm not sure how to go about doing it, and whether it would be worth the time. Probably not. Thanks.
EDIT : Thank you for all your answers.
This is the GetPixel method we're using:
public Color GetPixel(int x, int y)
{
int offsetFromOrigin = (y * this.stride) + (x * 3);
unsafe
{
return Color.FromArgb(this.imagePtr[offsetFromOrigin + 2], this.imagePtr[offsetFromOrigin + 1], this.imagePtr[offsetFromOrigin]);
}
}
Code profiling is the best place to start.
Matrix addition is a highly parallel operation and can be speed up by parallelizing the operation w/ multiple threads.
I would recommend using Intels IPP library that contains threaded highly optimized API for this sort of operation. Perhaps surprisingly it's only about $100 - but would add significant complexity to your project.
If you don't want to trouble yourself with mixed language programming and IPP, you could try out centerspace's C# math libraries. The NMath API contains easy to used, forward scaling, matrix operations.
Paul
matrix's addition complexity is
O(n^2)
, in number of additions.However, since there are no intermediate results, you can parallelize the additions using threads:
System.Drawing.Color is a structure, which on current versions of .NET kills most optimizations. Since you're only interested in the blue component anyway, use a method that only gets the data you need.
Now, exchange the order of iteration of x and y:
Now you're accessing all values within a scan line sequentially, which will make much better use of CPU cache for all three matrices involved (image.imagePtr, sumOfPixelValues, and sumOfPixelValuesSquared. [Thanks to Jon for noticing that when I fixed access to image.imagePtr, I broke the other two. Now the output array indexing is swapped to keep it optimal.]
Next, get rid of the member references. Another thread could theoretically be setting sumOfPixelValues to another array midway through, which does horrible horrible things to optimizations.
Now the compiler can generate optimal code for moving through the two output arrays, and after inlining and optimization, the inner loop can step through the image.imagePtr array with a stride of 3 instead of recalculating the offset all the time. Now an unsafe version for good measure, doing the optimizations that I think .NET ought to be smart enough to do but probably isn't:
This is a classic case of micro-optimisation failing horribly. You're not going to get anything from looking at that loop. To get real speed benefits you need to start off by looking at the big picture:-
EDIT:
Try having specialised image accessors so you're not wasting memory bandwidth:
or, better still:
and use the above in a loop like:
I'm not sure if it's faster but you may write something like;
Sometimes doing things in native C#, even unsafe calls, is just slower than using methods that have already been optimized.
No results guaranteed, but you may want to investigate the System.Windows.Media.Imaging name space and look at your whole problem in a different way.