I have an input image 416x416. How can I create an output of 4 x 10, where 4 is number of columns and 10 the number of rows?
My label data is 2D array with 4 columns and 10 rows.
I know about the reshape()
method but it requires that the resulted shape has same number of elements as the input.
With 416 x 416 input size and max pools layers I can get max 13 x 13
output.
Is there a way to achieve 4x10
output without loss of data?
My input label data looks like for example like
[[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]
[ 0 0 0 0]
[116 16 128 51]
[132 16 149 52]
[ 68 31 77 88]
[ 79 34 96 92]
[126 37 147 112]
[100 41 126 116]]
Which indicates there are 6 objects on my images that i want to detect, first value is xmin, second ymin , third xmax, fourth ymax.
The last layer of my networks looks like
(None, 13, 13, 1024)
First flatten the (None, 13, 13, 1024)
layer
model.add(Flatten())
it will give 13*13*1024=173056
1 dimensional tensor
Then add a dense layer
model.add(Dense(4*10))
it will output to 40
this will transform your 3D shape to 1D
then simply resize to your needs
model.add(Reshape(4,10))
This will work but will absolutely destroy the spatial nature of your data
I believe the easiest way to conform your predictions shape with the desired output is the solution proposed by @Darlyn. Assuming the network you have so far was declared (that outputs tensors of shape (13, 13, 1024)
) as this:
x = Input(shape=(416, 416, 3))
y = Conv2D(32, activation='relu')(x)
...
y = Conv2D(1024, activation='relu')(y)
You just need to add a regression layer that will try to predict the boxes, and then reshape these to (10, 4)
:
from keras.layers import Flatten, Dense, Reshape
samples = 1
boxes = 10
y = Flatten(name='flatten')(model.outputs)
y = Dense(boxes * 4, activation='relu')(y)
y = Reshape((boxes, 4), name='predictions')(y)
model = Model(inputs=model.inputs, outputs=y)
x_train = np.random.randn(samples, 416, 416, 3)
p = model.predict(x_train)
print(p.shape)
(1, 10, 4)
This works, but I'm not entire secure that directly regressing these values will produce good results. I usually see object-detection models using attention, region or saliency to determine the position of objects. There are a couple of object-detection keras implementations you could try:
keras-rcnn
classes = ["dog", "cat", "hooman"]
backbone = keras_rcnn.models.backbone.VGG16
model = keras_rcnn.models.RCNN((416, 416, 3), classes, backbone)
boxes, predictions = model.predict(x)
keras-retinanet
from keras_retinanet.models.resnet import resnet_retinanet
x = Input(shape=(416, 416, 3))
model = resnet_retinanet(len(classes), inputs=x)
_, _, boxes, _ = model.predict_on_batch(inputs)