Optimized float Blur variations

2019-03-02 07:54发布

问题:

I am looking for optimized functions in c++ for calculating areal averages of floats. the function is passed a source float array, a destination float array (same size as source array), array width and height, "blurring" area width and height.

The function should "wrap-around" edges for the blurring/averages calculations.

Here is example code that blur with a rectangular shape:

/*****************************************
*   Find averages extended variations
*****************************************/

void findaverages_ext(float *floatdata, float *dest_data, int fwidth, int fheight, int scale, int aw, int ah, int weight, int xoff, int yoff)
{
printf("findaverages_ext scale: %d, width: %d, height: %d, weight: %d \n", scale, aw, ah, weight);

float total = 0.0;
int spos = scale * fwidth * fheight;
int apos;

int w = aw;
int h = ah;

float* f_temp       = new float[fwidth * fheight];

// Horizontal
for(int y=0;y<fheight   ;y++)
{
    Sleep(10);      // Do not burn your processor 

    total = 0.0;

    // Process entire window for first pixel (including wrap-around edge)
    for (int kx = 0; kx <= w; ++kx)
        if (kx >= 0 && kx < fwidth)
            total += floatdata[y*fwidth + kx];
    // Wrap
    for (int kx = (fwidth-w); kx < fwidth; ++kx)
        if (kx >= 0 && kx < fwidth)
            total += floatdata[y*fwidth + kx];

    // Store first window
    f_temp[y*fwidth] = (total / (w*2+1));

    for(int x=1;x<fwidth    ;x++)           // x width changes with y
    {
        // Substract pixel leaving window
        if (x-w-1 >= 0)
            total -= floatdata[y*fwidth + x-w-1];

        // Add pixel entering window
        if (x+w < fwidth)
            total += floatdata[y*fwidth + x+w];
        else
            total += floatdata[y*fwidth + x+w-fwidth];

        // Store average
        apos = y * fwidth + x;
        f_temp[apos] = (total / (w*2+1));
    }
}


// Vertical
for(int x=0;x<fwidth    ;x++)
{
    Sleep(10);      // Do not burn your processor 

    total = 0.0;

    // Process entire window for first pixel
    for (int ky = 0; ky <= h; ++ky)             
        if (ky >= 0 && ky < fheight)
            total += f_temp[ky*fwidth + x];
    // Wrap
    for (int ky = fheight-h; ky < fheight; ++ky)                
        if (ky >= 0 && ky < fheight)
            total += f_temp[ky*fwidth + x];

    // Store first if not out of bounds
    dest_data[spos + x] = (total / (h*2+1));

    for(int y=1;y< fheight  ;y++)           // y width changes with x
    {
        // Substract pixel leaving window
        if (y-h-1 >= 0)
            total -= f_temp[(y-h-1)*fwidth + x];

        // Add pixel entering window
        if (y+h < fheight)
            total += f_temp[(y+h)*fwidth + x];
        else
            total += f_temp[(y+h-fheight)*fwidth + x];

        // Store average
        apos = y * fwidth + x;
        dest_data[spos+apos] = (total / (h*2+1));
    }
}

delete f_temp;
}

What I need is similar functions that for each pixel finds the average (blur) of pixels from shapes different than rectangular.

The specific shapes are: "S" (sharp edges), "O" (rectangular but hollow), "+" and "X", where the average float is stored at the center pixel on destination data array. Size of blur shape should be variable, width and height.

The functions does not need to be pixelperfect, only optimized for performance. There could be separate functions for each shape.

I am also happy if anyone can tip me of how to optimize the example function above for rectangluar blurring.

回答1:

What you are trying to implement are various sorts of digital filters for image processing. This is equivalent to convolving two signals where the 2nd one would be the filter's impulse response. So far, you regognized that a "rectangular average" is separable. By separable I mean, you can split the filter into two parts. One that operates along the X axis and one that operates along the Y axis -- in each case a 1D filter. This is nice and can save you lots of cycles. But not every filter is separable. Averaging along other shapres (S, O, +, X) is not separable. You need to actually compute a 2D convolution for these.

As for performance, you can speed up your 1D averages by properly implementing a "moving average". A proper "moving average" implementation only requires a fixed amount of little work per pixel regardless of the averaging "window". This can be done by recognizing that neighbouring pixels of the target image are computed by an average of almost the same pixels. You can reuse these sums for the neighbouring target pixel by adding one new pixel intensity and subtracting an older one (for the 1D case).

In case of arbitrary non-separable filters your best bet performance-wise is "fast convolution" which is FFT-based. Checkout www.dspguide.com. If I recall correctly, there is even a chapter on how to properly do "fast convolution" using the FFT algorithm. Although, they explain it for 1-dimensional signals, it also applies to 2-dimensional signals. For images you have to perform 2D-FFT/iFFT transforms.



回答2:

To add to sellibitze's answer, you can use a summed area table for your O, S and + kernels (not for the X one though). That way you can convolve a pixel in constant time, and it's probably the fastest method to do it for kernel shapes that allow it.

Basically, a SAT is a data structure that lets you calculate the sum of any axis-aligned rectangle. For the O kernel, after you've built a SAT, you'd take the sum of the outer rect's pixels and subtract the sum of the inner rect's pixels. The S and + kernels can be implemented similarly.

For the X kernel you can use a different approach. A skewed box filter is separable:

You can convolve with two long, thin skewed box filters, then add the two resulting images together. The center of the X will be counted twice, so will you need to convolve with another skewed box filter, and subtract that.

Apart from that, you can optimize your box blur in many ways.

  • Remove the two ifs from the inner loop by splitting that loop into three loops - two short loops that do checks, and one long loop that doesn't. Or you could pad your array with extra elements from all directions - that way you can simplify your code.
  • Calculate values like h * 2 + 1 outside the loops.
  • An expression like f_temp[ky*fwidth + x] does two adds and one multiplication. You can initialize a pointer to &f_temp[ky*fwidth] outside the loop, and just increment that pointer in the loop.
  • Don't do the division by h * 2 + 1 in the horizontal step. Instead, divide by the square of that in the vertical step.