可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have an input image 416x416. How can I create an output of 4 x 10, where 4 is number of columns and 10 the number of rows?

My label data is 2D array with 4 columns and 10 rows.

I know about the reshape() method but it requires that the resulted shape has same number of elements as the input.

With 416 x 416 input size and max pools layers I can get max 13 x 13 output.

Is there a way to achieve 4x10 output without loss of data?

My input label data looks like for example like

[[  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [116  16 128  51]
 [132  16 149  52]
 [ 68  31  77  88]
 [ 79  34  96  92]
 [126  37 147 112]
 [100  41 126 116]]

Which indicates there are 6 objects on my images that i want to detect, first value is xmin, second ymin , third xmax, fourth ymax.

The last layer of my networks looks like

(None, 13, 13, 1024)

回答1:

First flatten the (None, 13, 13, 1024) layer

model.add(Flatten())

it will give 13*13*1024=173056

1 dimensional tensor

Then add a dense layer

model.add(Dense(4*10)) it will output to 40

this will transform your 3D shape to 1D

then simply resize to your needs

model.add(Reshape(4,10))

This will work but will absolutely destroy the spatial nature of your data

回答2:

I believe the easiest way to conform your predictions shape with the desired output is the solution proposed by @Darlyn. Assuming the network you have so far was declared (that outputs tensors of shape (13, 13, 1024)) as this:

x = Input(shape=(416, 416, 3))
y = Conv2D(32, activation='relu')(x)
...
y = Conv2D(1024, activation='relu')(y)

You just need to add a regression layer that will try to predict the boxes, and then reshape these to (10, 4):

from keras.layers import Flatten, Dense, Reshape

samples = 1
boxes = 10

y = Flatten(name='flatten')(model.outputs)
y = Dense(boxes * 4, activation='relu')(y)
y = Reshape((boxes, 4), name='predictions')(y)
model = Model(inputs=model.inputs, outputs=y)

x_train = np.random.randn(samples, 416, 416, 3)

p = model.predict(x_train)
print(p.shape)

(1, 10, 4)

This works, but I'm not entire secure that directly regressing these values will produce good results. I usually see object-detection models using attention, region or saliency to determine the position of objects. There are a couple of object-detection keras implementations you could try:

keras-rcnn

classes = ["dog", "cat", "hooman"]

backbone = keras_rcnn.models.backbone.VGG16
model = keras_rcnn.models.RCNN((416, 416, 3), classes, backbone)
boxes, predictions = model.predict(x)

keras-retinanet

from keras_retinanet.models.resnet import resnet_retinanet

x = Input(shape=(416, 416, 3))
model = resnet_retinanet(len(classes), inputs=x)
_, _, boxes, _ = model.predict_on_batch(inputs)