-->

Derivatives in some Deconvolution layers mostly al

2019-07-30 02:18发布

问题:

This is a really weird error, partly a follow-up to the previous question(Deconvolution layer FCN initialization - loss drops too fast).

However I init Deconv layers (bilinear or gaussian), I get the same situation:

1) Weights are updated, I checked this for multiple iterations. The size of deconvolution/upsample layers is the same: (2,2,8,8)

First of all, net_mcn.layers[idx].blobs[0].diff return matrices with floats, the last Deconv layer (upscore5) produces two array with the same numbers with opposite signs, i.e. weights should be going at the same rate in different directions, but the resulting weights are in fact almost identical!

Quite surprisingly, the remaining four deconv layers do not have this error. So when I compare models, for example, for iter=5000 and iter=55000 deconv layers weights are very different.

Even more surprisingly, other layers (convolutional) change much less!

Here's the bit of the printout at the init to confirm that deconv layers are updated:

I0724 03:10:30.451787 32249 net.cpp:198] loss needs backward computation.
I0724 03:10:30.451792 32249 net.cpp:198] score_final needs backward computation.
I0724 03:10:30.451797 32249 net.cpp:198] upscore5 needs backward computation.
I0724 03:10:30.451802 32249 net.cpp:198] upscore4 needs backward computation.
I0724 03:10:30.451804 32249 net.cpp:198] upscore3 needs backward computation.
I0724 03:10:30.451807 32249 net.cpp:198] upscore2 needs backward computation.
I0724 03:10:30.451810 32249 net.cpp:198] upscore needs backward computation.
I0724 03:10:30.451814 32249 net.cpp:198] score_fr3 needs backward computation.
I0724 03:10:30.451818 32249 net.cpp:198] score_fr2 needs backward computation.
I0724 03:10:30.451822 32249 net.cpp:198] score_fr needs backward computation.

2) Blobs diffs are all zeros for deconvolution layers

Data stream (Finding gradient of a Caffe conv-filter with regards to input) diffs for almost ALL deconv layers are all zeroes for the full duration of the algorithm, with a few exceptions (also near 0 like -2.28945263e-09).

Convolution layer diffs look OK.

I see this as as a paradox - the weights in the deconv layers are updated but diffs wrt to the neurons are all 0's (constant?)

3) Deconv features grow really large quickly

Far larger than in FCNs and CRFasRNN, up to 5.4e+03, at the same time nearby pixels can have very varying values (e.g. 5e+02 and -300) for the same class.

4) Training and validation error go down, often very quickly

As I pointed out in the referred question.

So putting it all together- I don't understand what to make of it. If it is overfitting, then why does validation error reduces too?

The architecture of the network is

fc7->relu1->dropout->conv2048->conv1024->conv512->deconv1->deconv2->deconv3->deconv4->deconv5->crop->softmax_with_loss

EDIT: I was wrong, not all entries in all net.blobs[...].diffs are 0's, but mostly as layers get larger. This depends on the data size.