I try to implement Deconvolution layer for a Convolution Network. What I mean by deconvolution is that suppose I have 3x227x227
input image to a layer with filters in size 3x11x11
and stride 4. Hence the resulting feature map has size 55x55
. What I try to do is to apply the reverse operation where I project 55x55
feature map to again 3x227x227
image. Basically each value on 55x55
feature map is weighted by 3x11x11
filters and projected to image space and overlapping regions due to stride is averaged.
I tried to implement it in numpy without any success. I found the solution with a brute-force nested for loops but it is damn slow. How can I implement it in numpy efficiently? Any help is welcome.
As discussed in this question, a deconvolution is just a convolutional layer, but with a particular choice of padding, stride and filter size.
For example, if your current image size is 55x55
, you can apply a convolution with padding=20
, stride=1
and filter=[21x21]
to obtain a 75x75
image, then 95x95
and so on. (I'm not saying this choice of numbers gives the desired quality of the output image, just the size. Actually, I think downsampling from 227x227
to 55x55
and then upsampling back to 227x227
is too aggressive, but you are free to try any architecture).
Here's the implementation of a forward pass for any stride and padding. It does im2col transformation, but using stride_tricks
from numpy. It's not as optimized as modern GPU implementations, but definitely faster than 4 inner loops:
import numpy as np
def conv_forward(x, w, b, stride, pad):
N, C, H, W = x.shape
F, _, HH, WW = w.shape
# Check dimensions
assert (W + 2 * pad - WW) % stride == 0, 'width does not work'
assert (H + 2 * pad - HH) % stride == 0, 'height does not work'
# Pad the input
p = pad
x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
# Figure out output dimensions
H += 2 * pad
W += 2 * pad
out_h = (H - HH) / stride + 1
out_w = (W - WW) / stride + 1
# Perform an im2col operation by picking clever strides
shape = (C, HH, WW, N, out_h, out_w)
strides = (H * W, W, 1, C * H * W, stride * W, stride)
strides = x.itemsize * np.array(strides)
x_stride = np.lib.stride_tricks.as_strided(x_padded,
shape=shape, strides=strides)
x_cols = np.ascontiguousarray(x_stride)
x_cols.shape = (C * HH * WW, N * out_h * out_w)
# Now all our convolutions are a big matrix multiply
res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1)
# Reshape the output
res.shape = (F, N, out_h, out_w)
out = res.transpose(1, 0, 2, 3)
out = np.ascontiguousarray(out)
return out