Caffe classification labels in HDF5

2019-05-22 16:00发布

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.

For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:

    Iteration 19200, loss = -118232
    Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)

Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.

UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows

    f['label'] = numpy.array([1, 0, 0, 0, 0, 0])        

Trying to run my network now returns

   Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)       

After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:

  Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240) 
   Number of labels must match number of predictions; e.g., if softmax axis == 1 
   and prediction shape is (N, C, H, W), label count (number of labels) 
   must be N*H*W, with integer values in {0, 1, ..., C-1}.

My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?

1条回答
小情绪 Triste *
2楼-- · 2019-05-22 16:29

Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.

I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.

Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".

Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.


Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.

查看更多
登录 后发表回答