In Tensorflow (in general deep learning), can I ap

Could anyone make sure my reasoning?

Let's say I have a (pre-trained) fully connected layer fc that takes bx20x20x10 as input and bx64as output layer, where b is batch size.

Now, I have an input of cx100x60x10. The height and weight 100x60 can be subdivided into 5x3 of 20x20. I would like to have 5x3 of local response (output) by fc layer, i.e., `cx5x3x64'.

Now I am thinking: doing this is same with having convolution layer with fc weights and stride with width 20 and height 20. Is that correct? There can be difference?

Yes, it will be the same if appropriate reshaping of the dense layer weight matrix is performed.

Let us first look at the dense layer. You input a 20 x 20 x 10 matrix to the dense layer. It will first be flattened out to produce a 4000 x 1 vector. You want the output to be of size 64 x 1 vector. So, the weight matrix required is 4000 x 64 and 64 bias parameters. Then y = w^T * x + b = [4000 x 64]^T * [4000 x 1] + [64 x 1] will yield a [64 x 1] vector. Therefore, y[i] = w[i][0]*x[0] + ... + w[i][3999]*x[3999] + b[i] for i = [0, 63]. Note that b indicates a bias parameter.

Let us turn to convolution. To produce a 5 x 3 x 64 output from an input of size 100 x 60 x 10, you need 64 filters, each of size (20,20) and strides (20,20) with no zero-padding. Each 20 x 20 filter however has local connectivity extending along the entire depth i.e. a neuron is connected to all the 10 dimensions along the depth of input. Please read this for more information on local connectivity of convolutional layer.

You convolutional layer has a receptive field of 20 x 20. Each neuron in the convolutional layer will be connected to a 20 x 20 x 10. Thus total 4000 weights (and one bias parameter). You have 64 such filters. Therefore, your total learnable weights for this layer = 4000 x 64 + 64. Convolution between one 20 x 20 x 10 block of x and w (size = 64 x 20 x 20 x 10) can be performed as:

convResult = np.sum(np.sum(np.sum(x*w[:,:,::-1,::-1], axis=-1), axis=-1),axis=-1)

There are some fine points here. I did w[:,:,::-1,::-1] because theano convolution flips the convolution kernel (well, not that simple!). If you are interested in who flips and who does not, read this.

Finally, dense layer and convolution layer (in this context) essentially do the same operation. They first element-wise multiply and then sum up two sets of vectors/matrices of 4000 elements. This procedure is repeated 64 times to produce a 64 x 1 vector. So, it is possible to achieve exactly the same result with dense and convolution layer by proper reshaping of the dense layer weight matrix. However, you need to take care of kernel flipping to match the results.

Below I give a code snippet to compute convolution manually (using numpy) and using Theano.

import theano
from theano import tensor as T
import numpy as np

X = T.ftensor4('X')
W = T.ftensor4('W')
out = T.nnet.conv2d(X,W)
f = theano.function([X, W], out, allow_input_downcast=True)

x = np.random.random((1,10,20,20))
w = np.random.random((64,10,20,20))

# convolution using Theano
c1 = np.squeeze(f(x,w)[0])

# convolution using Numpy
c2 = np.sum(np.sum(np.sum(x*w[:,:,::-1,::-1],axis=-1),axis=-1),axis=-1)

# check that both are almost identical
print np.amax(c2 - c1)