I don't have problem in understanding output shape of a Dense layer followed by a Flatten layer. Output shape is in accordance of my understanding i.e (Batch size, unit).
nn= keras.Sequential()
nn.add(keras.layers.Conv2D(8,kernel_size=(2,2),input_shape=(4,5,1)))
nn.add(keras.layers.Conv2D(1,kernel_size=(2,2)))
nn.add(keras.layers.Flatten())
nn.add(keras.layers.Dense(5))
nn.add(keras.layers.Dense(1))
nn.summary()
Output is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 3, 4, 8) 40
_________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 3, 1) 33
_________________________________________________________________
flatten_1 (Flatten) (None, 6) 0
_________________________________________________________________
dense_1 (Dense) (None, 5) 35
_________________________________________________________________
dense_2 (Dense) (None, 1) 6
=================================================================
Total params: 114
Trainable params: 114
Non-trainable params: 0
_________________________________________________________________
But I am having trouble in understanding the output shape of a dense layer for multidimensional input .So for following code
nn= keras.Sequential()
nn.add(keras.layers.Conv2D(8,kernel_size=(2,2),input_shape=(4,5,1)))
nn.add(keras.layers.Conv2D(1,kernel_size=(2,2)))
#nn.add(keras.layers.Flatten())
nn.add(keras.layers.Dense(5))
nn.add(keras.layers.Dense(1))
nn.summary()
output is
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 3, 4, 8) 40
_________________________________________________________________
conv2d_2 (Conv2D) (None, 2, 3, 1) 33
_________________________________________________________________
dense_1 (Dense) (None, 2, 3, 5) 10
_________________________________________________________________
dense_2 (Dense) (None, 2, 3, 1) 6
=================================================================
Total params: 89
Trainable params: 89
I am unable to make intuition for output shape of dense_1
and dense_2
layer. Shouldn't the final output be a scalar or (batch,unit)?
Following answer to similar question tries to explain the intuition but I can not fully grasp the concept.
From the same answer:
That is, each output "pixel" (i, j) in the 640x959 grid is calculated as a dense combination of the 8 different convolution channels at point (i, j) from the previous layer.
May be some explanation with pictures will be useful .