Currently learning about computer vision and machine learning through the free online course by stanford CS131. Came across some heavy math formulas and was wondering if anyone could explain to me how one would go on about in implementing a naive 4 nested for loops for the convolution algorithm using only knowing the image height, width and kernel height and width. I was able to come up with this solution by researching online.
image_padded = np.zeros((image.shape[0] + 2, image.shape[1] + 2))
image_padded[1:-1, 1:-1] = image
for x in range(image.shape[1]): # Loop over every pixel of the image
for y in range(image.shape[0]):
# element-wise multiplication of the kernel and the image
out[y, x] = (kernel * image_padded[y:y + 3, x:x + 3]).sum()
I was able to understand this based on some website examples using this type of algorithm however, I can't seem to grasp how a 4 nested for loops would do it. And if you could, break down the formula into something more digestible then the given mathematical equation found online.
Edit: Just to clarify while the code snippet I left works to a certain degree I'm trying to come up with a solution that's a bit less optimized and a bit more beginner friendly such as what this code is asking:
def conv_nested(image, kernel):
"""A naive implementation of convolution filter.
This is a naive implementation of convolution using 4 nested for-loops.
This function computes convolution of an image with a kernel and outputs
the result that has the same shape as the input image.
Args:
image: numpy array of shape (Hi, Wi)
kernel: numpy array of shape (Hk, Wk)
Returns:
out: numpy array of shape (Hi, Wi)
"""
Hi, Wi = image.shape
Hk, Wk = kernel.shape
out = np.zeros((Hi, Wi))
### YOUR CODE HERE
### END YOUR CODE
return out