In this tutorial, the output volumes are stated in output [25], and the receptive fields are specified in output [26].
Okay, the input volume [3, 227, 227]
gets convolved with the region of size [3, 11, 11]
.
Using this formula (W−F+2P)/S+1
, where:
W
= the input volume size
F
= the receptive field size
P
= padding
S
= stride
...results with (227 - 11)/4 + 1 = 55
i.e. [55*55*96]. So far so good :)
For 'pool1' they used F=3
and S=2
I think? The calculation checks out: 55-3/2+1=27
.
From this point I get a bit confused. The receptive field for the second convnet layer is [48, 5, 5]
, yet the output for 'conv2' is equal to [256, 27, 27]
. What calculation happened here?
And then, the height and width of the output volumes of 'conv3' to 'conv4' are all the same [13, 13]
? What's going on?
Thanks!
If you look closely at the parameters of
conv2
layer you'll noticeThat is, the input blob is padded by 2 extra pixels all around, thus the formula now is
Padding a kernel size of
5
with2
pixels from both sides yields the same output size.