How can I implement deconvolution layer for a CNN

I try to implement Deconvolution layer for a Convolution Network. What I mean by deconvolution is that suppose I have 3x227x227 input image to a layer with filters in size 3x11x11 and stride 4. Hence the resulting feature map has size 55x55. What I try to do is to apply the reverse operation where I project 55x55 feature map to again 3x227x227 image. Basically each value on 55x55 feature map is weighted by 3x11x11 filters and projected to image space and overlapping regions due to stride is averaged.

I tried to implement it in numpy without any success. I found the solution with a brute-force nested for loops but it is damn slow. How can I implement it in numpy efficiently? Any help is welcome.

As discussed in this question, a deconvolution is just a convolutional layer, but with a particular choice of padding, stride and filter size.

For example, if your current image size is 55x55, you can apply a convolution with padding=20, stride=1 and filter=[21x21] to obtain a 75x75 image, then 95x95 and so on. (I'm not saying this choice of numbers gives the desired quality of the output image, just the size. Actually, I think downsampling from 227x227 to 55x55 and then upsampling back to 227x227 is too aggressive, but you are free to try any architecture).

Here's the implementation of a forward pass for any stride and padding. It does im2col transformation, but using stride_tricks from numpy. It's not as optimized as modern GPU implementations, but definitely faster than 4 inner loops:

import numpy as np

def conv_forward(x, w, b, stride, pad):
  N, C, H, W = x.shape
  F, _, HH, WW = w.shape

  # Check dimensions
  assert (W + 2 * pad - WW) % stride == 0, 'width does not work'
  assert (H + 2 * pad - HH) % stride == 0, 'height does not work'

  # Pad the input
  p = pad
  x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')

  # Figure out output dimensions
  H += 2 * pad
  W += 2 * pad
  out_h = (H - HH) / stride + 1
  out_w = (W - WW) / stride + 1

  # Perform an im2col operation by picking clever strides
  shape = (C, HH, WW, N, out_h, out_w)
  strides = (H * W, W, 1, C * H * W, stride * W, stride)
  strides = x.itemsize * np.array(strides)
  x_stride = np.lib.stride_tricks.as_strided(x_padded,
                                             shape=shape, strides=strides)
  x_cols = np.ascontiguousarray(x_stride)
  x_cols.shape = (C * HH * WW, N * out_h * out_w)

  # Now all our convolutions are a big matrix multiply
  res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1)

  # Reshape the output
  res.shape = (F, N, out_h, out_w)
  out = res.transpose(1, 0, 2, 3)
  out = np.ascontiguousarray(out)
  return out