Speed up Matrix Addition in C#-第2页回答

I'd like to optimize this piece of code :

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{            
        for (int x = 0; x < Width; x++)
        {
            for (int y = 0; y < Height; y++)
            {
                Byte  pixelValue = image.GetPixel(x, y).B;
                this.sumOfPixelValues[x, y] += pixelValue;
                this.sumOfPixelValuesSquared[x, y] += pixelValue * pixelValue;
            }
        }
}

This is to be used for image processing, and we're currently running this for about 200 images. We've optimized the GetPixel value to use unsafe code, and we're not using image.Width, or image.Height, as those properties were adding to our runtime costs.

However, we're still stuck at a low speed. The problem is that our images are 640x480, so the middle of the loop is being called about 640x480x200 times. I'd like to ask if there's a way to speed it up somehow, or convince me that it's fast enough as it is. Perhaps a way is through some fast Matrix Addition, or is Matrix Addition inherently an n^2 operation with no way to speed it up?

Perhaps doing array accesses via unsafe code would speed it up, but I'm not sure how to go about doing it, and whether it would be worth the time. Probably not. Thanks.

EDIT : Thank you for all your answers.

This is the GetPixel method we're using:

 public Color GetPixel(int x, int y)
    {
        int offsetFromOrigin = (y * this.stride) + (x * 3);
        unsafe
        {
            return Color.FromArgb(this.imagePtr[offsetFromOrigin + 2], this.imagePtr[offsetFromOrigin + 1], this.imagePtr[offsetFromOrigin]);
        }
    }

标签： c# image-processing

15条回答

爷的心禁止访问

2楼-- · 2020-05-27 04:10

Code profiling is the best place to start.

Matrix addition is a highly parallel operation and can be speed up by parallelizing the operation w/ multiple threads.

I would recommend using Intels IPP library that contains threaded highly optimized API for this sort of operation. Perhaps surprisingly it's only about $100 - but would add significant complexity to your project.

If you don't want to trouble yourself with mixed language programming and IPP, you could try out centerspace's C# math libraries. The NMath API contains easy to used, forward scaling, matrix operations.

Paul

0人赞添加讨论(0) 举报

别忘想泡老子

3楼-- · 2020-05-27 04:10

matrix's addition complexity is O(n^2), in number of additions.

However, since there are no intermediate results, you can parallelize the additions using threads:

it easy to proof that the resulting algorithm will be lock-free
you can tune the optimal number of threads to use

0人赞添加讨论(0) 举报

戒情不戒烟

4楼-- · 2020-05-27 04:13

System.Drawing.Color is a structure, which on current versions of .NET kills most optimizations. Since you're only interested in the blue component anyway, use a method that only gets the data you need.

public byte GetPixelBlue(int x, int y)
{
    int offsetFromOrigin = (y * this.stride) + (x * 3);
    unsafe
    {
        return this.imagePtr[offsetFromOrigin];
    }
}

Now, exchange the order of iteration of x and y:

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{            
    for (int y = 0; y < Height; y++)
    {
        for (int x = 0; x < Width; x++)
        {
            Byte  pixelValue = image.GetPixelBlue(x, y);
            this.sumOfPixelValues[y, x] += pixelValue;
            this.sumOfPixelValuesSquared[y, x] += pixelValue * pixelValue;
        }
    }
}

Now you're accessing all values within a scan line sequentially, which will make much better use of CPU cache for all three matrices involved (image.imagePtr, sumOfPixelValues, and sumOfPixelValuesSquared. [Thanks to Jon for noticing that when I fixed access to image.imagePtr, I broke the other two. Now the output array indexing is swapped to keep it optimal.]

Next, get rid of the member references. Another thread could theoretically be setting sumOfPixelValues to another array midway through, which does horrible horrible things to optimizations.

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{          
    uint [,] sums = this.sumOfPixelValues;
    ulong [,] squares = this.sumOfPixelValuesSquared;
    for (int y = 0; y < Height; y++)
    {
        for (int x = 0; x < Width; x++)
        {
            Byte  pixelValue = image.GetPixelBlue(x, y);
            sums[y, x] += pixelValue;
            squares[y, x] += pixelValue * pixelValue;
        }
    }
}

Now the compiler can generate optimal code for moving through the two output arrays, and after inlining and optimization, the inner loop can step through the image.imagePtr array with a stride of 3 instead of recalculating the offset all the time. Now an unsafe version for good measure, doing the optimizations that I think .NET ought to be smart enough to do but probably isn't:

unsafe public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{          
    byte* scanline = image.imagePtr;
    fixed (uint* sums = &this.sumOfPixelValues[0,0])
    fixed (uint* squared = &this.sumOfPixelValuesSquared[0,0])
    for (int y = 0; y < Height; y++)
    {
        byte* blue = scanline;
        for (int x = 0; x < Width; x++)
        {
            byte pixelValue = *blue;
            *sums += pixelValue;
            *squares += pixelValue * pixelValue;
            blue += 3;
            sums++;
            squares++;
        }
        scanline += image.stride;
    }
}

0人赞添加讨论(0) 举报

够拽才男人

5楼-- · 2020-05-27 04:14

This is a classic case of micro-optimisation failing horribly. You're not going to get anything from looking at that loop. To get real speed benefits you need to start off by looking at the big picture:-

Can you asynchronously preload image[n+1] whilst processing image[n]?
Can you load just the B channel from the image? This will decrease memory bandwidth?
Can you load the B value and update the sumOfPixelValues(Squared) arrays directly, i.e. read the file and update instead of read file, store, read, update? Again, this decreases memory bandwidth.
Can you use one dimensional arrays instead of two dimensional? Maybe create your own array class that works either way.
Perhaps you could look into using Mono and the SIMD extensions?
Can you process the image in chunks and assign them to idle CPUs in a multi-cpu environment?

EDIT:

Try having specialised image accessors so you're not wasting memory bandwidth:

public Color GetBPixel (int x, int y)
{
    int offsetFromOrigin = (y * this.stride) + (x * 3);
    unsafe
    {
        return this.imagePtr [offsetFromOrigin + 1];
    }
}

or, better still:

public Color GetBPixel (int offset)
{
    unsafe
    {
        return this.imagePtr [offset + 1];
    }
}

and use the above in a loop like:

for (int start_offset = 0, y = 0 ; y < Height ; start_offset += stride, ++y)
{
   for (int x = 0, offset = start_offset ; x < Width ; offset += 3, ++x)
   {
      pixel = GetBPixel (offset);
      // do stuff
   }
}

0人赞添加讨论(0) 举报

太酷不给撩

6楼-- · 2020-05-27 04:18

I'm not sure if it's faster but you may write something like;

public void PopulatePixelValueMatrices(GenericImage image,int Width, int Height)
{            
        Byte pixelValue;
        for (int x = 0; x < Width; x++)
        {
            for (int y = 0; y < Height; y++)
            {
                pixelValue = image.GetPixel(x, y).B;
                this.sumOfPixelValues[x, y] += pixelValue;
                this.sumOfPixelValuesSquared[x, y] += pixelValue * pixelValue;
            }
        }
}

0人赞添加讨论(0) 举报

欢心

7楼-- · 2020-05-27 04:20

Sometimes doing things in native C#, even unsafe calls, is just slower than using methods that have already been optimized.

No results guaranteed, but you may want to investigate the System.Windows.Media.Imaging name space and look at your whole problem in a different way.

0人赞添加讨论(0) 举报

Speed up Matrix Addition in C#

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间