I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.
For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:
Iteration 19200, loss = -118232
Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)
Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.
UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output
of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows
f['label'] = numpy.array([1, 0, 0, 0, 0, 0])
Trying to run my network now returns
Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)
After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?
Since you moved from regression to classification, you need to output not a scalar to compare with
"label"
but rather a probability vector of length num-labels to compare with the discrete class"label"
. You need to changenum_output
parameter of the layer before"SoftmaxWithLoss"
from1
to num-labels.I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.
Update:
You made two changes:
num_output
1-->6, and you also changed your inputlabel
from a scalar to vector.The first change was the only one you needed for using
"SoftmaxWithLossLayer"
.Do not change
label
from a scalar to a "hot-vector".Why?
Because
"SoftmaxWithLoss"
basically looks at the 6-vector prediction you output, interpret the ground-truthlabel
as index and looks at-log(p[label])
: the closerp[label]
is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a predictionp[label]
close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.Using a "hot-vector" as ground-truth input
label
, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.