Naive Implementation of Convolution algorithm

2020-07-23 05:30发布

问题:

Currently learning about computer vision and machine learning through the free online course by stanford CS131. Came across some heavy math formulas and was wondering if anyone could explain to me how one would go on about in implementing a naive 4 nested for loops for the convolution algorithm using only knowing the image height, width and kernel height and width. I was able to come up with this solution by researching online.

image_padded = np.zeros((image.shape[0] + 2, image.shape[1] + 2))
image_padded[1:-1, 1:-1] = image
for x in range(image.shape[1]):  # Loop over every pixel of the image
    for y in range(image.shape[0]):
        # element-wise multiplication of the kernel and the image
        out[y, x] = (kernel * image_padded[y:y + 3, x:x + 3]).sum()

I was able to understand this based on some website examples using this type of algorithm however, I can't seem to grasp how a 4 nested for loops would do it. And if you could, break down the formula into something more digestible then the given mathematical equation found online.

Edit: Just to clarify while the code snippet I left works to a certain degree I'm trying to come up with a solution that's a bit less optimized and a bit more beginner friendly such as what this code is asking:

def conv_nested(image, kernel):
    """A naive implementation of convolution filter.

    This is a naive implementation of convolution using 4 nested for-loops.
    This function computes convolution of an image with a kernel and outputs
    the result that has the same shape as the input image.

    Args:
        image: numpy array of shape (Hi, Wi)
        kernel: numpy array of shape (Hk, Wk)

    Returns:
        out: numpy array of shape (Hi, Wi)
    """
    Hi, Wi = image.shape
    Hk, Wk = kernel.shape
    out = np.zeros((Hi, Wi))
    ### YOUR CODE HERE

    ### END YOUR CODE

    return out

回答1:

For this task scipy.signal.correlate2d is your friend.

Demo

I wrapped your code in a function named naive_correlation:

import numpy as np

def naive_correlation(image, kernel):
    image_padded = np.zeros((image.shape[0] + 2, image.shape[1] + 2))
    image_padded[1:-1, 1:-1] = image
    out = np.zeros_like(image)
    for x in range(image.shape[1]):image
        for y in range(image.shape[0]):
            out[y, x] = (kernel * image_padded[y:y + 3, x:x + 3]).sum()
    return out

Notice that your snippet throws an error because out is not initialized.

In [67]: from scipy.signal import correlate2d

In [68]: img = np.array([[3, 9, 5, 9],
    ...:                 [1, 7, 4, 3],
    ...:                 [2, 1, 6, 5]])
    ...: 

In [69]: kernel = np.array([[0, 1, 0],
    ...:                    [0, 0, 0],
    ...:                    [0, -1, 0]])
    ...: 

In [70]: res1 = correlate2d(img, kernel, mode='same')

In [71]: res1
Out[71]: 
array([[-1, -7, -4, -3],
       [ 1,  8, -1,  4],
       [ 1,  7,  4,  3]])

In [72]: res2 = naive_correlation(img, kernel)

In [73]: np.array_equal(res1, res2)
Out[73]: True

If you wish to perform convolution rather than correlation you could use convolve2d.

Edit

Is this what you are looking for?

def explicit_correlation(image, kernel):
    hi, wi= image.shape
    hk, wk = kernel.shape
    image_padded = np.zeros(shape=(hi + hk - 1, wi + wk - 1))    
    image_padded[hk//2:-hk//2, wk//2:-wk//2] = image
    out = np.zeros(shape=image.shape)
    for row in range(hi):
        for col in range(wi):
            for i in range(hk):
                for j in range(wk):
                    out[row, col] += image_padded[row + i, col + j]*kernel[i, j]
    return out