I'm trying to develop a deconvolutional layer (or a transposed convolutional layer to be precise).
In the forward pass, I do a full convolution (convolution with zero padding) In the backward pass, I do a valid convolution (convolution without padding) to pass the errors to the previous layer
The gradients of the biases are easy to compute, simply a matter of averaging over the superfluous dimensions.
The problem is I don't know how to update the weights of the convolutional filters. What are the gradients ? I'm sure it is a convolution operation but I don't see how. I tried a valid convolution of the inputs with the errors but to no avail.
Deconvolution explained
First of all, deconvolution is a convolutional layer, only used for a different purpose, namely upsampling (why it's useful is explained in this paper).
For example, here a
2x2
input image (bottom image in blue) is upsampled to4x4
(top image in green):To make it a valid convolution, the input is first padded to make it
6x6
, after which3x3
filter is applied without striding. Just like in ordinary convolutional layer, you can choose different padding/striding strategies to produce the image size you want.Backward pass
Now it should be clear that backward pass for deconvolution is a partial case of backward pass for a convolutional layer, with particular stride and padding. I think you've done it already, but here's a naive (and not very efficient) implementation for any stride and padding:
The same can be done more efficiently using
im2col
andcol2im
, but it's just an implementation detail. Another funny fact: the backward pass for a convolution operation (for both the data and the weights) is again a convolution, but with spatially-flipped filters.Here's how it's applied (plain simple SGD):
As you can see, it's pretty straightforward, just need to understand that you're applying same old convolution.