I read the yolov2 implementation.I have some questions about it's loss.Below is the pseudo code of the loss function, i hope i got it right.
costs = np.zeros(output.shape)
for pred_box in all prediction box:
if (max iou pred_box has with all truth box < threshold):
costs[pred_box][obj] = (sigmoid(obj)-0)^2 * 1
else:
costs[pred_box][obj] = 0
costs[pred_box][x] = (sigmoid(x)-0.5)^2 * 0.01
costs[pred_box][y] = (sigmoid(y)-0.5)^2 * 0.01
costs[pred_box][w] = (w-0)^2 * 0.01
costs[pred_box][h] = (h-0)^2 * 0.01
for truth_box all ground truth box:
pred_box = the one prediction box that is supposed to predict for truth_box
costs[pred_box][obj] = (1-sigmoid(obj))^2 * 5
costs[pred_box][x] = (sigmoid(x)-truex)^2 * (2- truew*trueh/imagew*imageh)
costs[pred_box][y] = (sigmoid(y)-truey)^2 * (2- truew*trueh/imagew*imageh)
costs[pred_box][w] = (w-log(truew/anchorw))^2 * (2- truew*trueh/imagew*imageh)
costs[pred_box][h] = (h-log(trueh/anchorh))^2 * (2- truew*trueh/imagew*imageh)
costs[pred_box][classes] = softmax_euclidean
total loss = sum(costs)
I have some questions about that :
1.The code randomly resize the train images to dimensions between 320 and 608 every 10 batch,but the anchor box isn't resized accordingly.why not resize the anchor size too.I mean you selected a set of most common anchors in a 13*13 feature map,those anchors won't be common in a 19*19 feature map,so why not resize anchor according to image size.
2.Is applying cost for x,y,w,h prediction of boxes that isn't assigned a truth,which pushes w,h to exactly fit the anchor and x,y to center in the cell by default ,helpful and why is that.Why not apply cost of location prediction only to the ones assigned a truth and ignore unassigned ones.
3.Why not simply apply (obj-0)^2 as cost of obj prediction of all boxes with no truth assigned.In yolov2,obj prediction for boxes with no truth assigned are not all applied cost,only those with no truth assigned and don't overlap much with all truth and are applied cost. Why is that ,it's complicated.