I wish to use a loss layer of type InfogainLoss
in my model. But I am having difficulties defining it properly.
Is there any tutorial/example on the usage of
INFOGAIN_LOSS
layer?Should the input to this layer, the class probabilities, be the output of a
SOFTMAX
layer, or is it enough to input the "top" of a fully connected layer?
INFOGAIN_LOSS
requires three inputs: class probabilities, labels and the matrix H
.
The matrix H
can be provided either as a layer parameters infogain_loss_param { source: "fiename" }
.
Suppose I have a python script that computes H
as a numpy.array
of shape (L,L)
with dtype='f4'
(where L
is the number of labels in my model).
How can I convert my
numpy.array
into abinproto
file that can be provided as ainfogain_loss_param { source }
to the model?Suppose I want
H
to be provided as the third input (bottom) to the loss layer (rather than as a model parameter). How can I do this?
Do I define a new data layer which "top" isH
? If so, wouldn't the data of this layer be incremented every training iteration like the training data is incremented? How can I define multiple unrelated input "data" layers, and how does caffe know to read from the training/testing "data" layer batch after batch, while from theH
"data" layer it knows to read only once for all the training process?
Since I had to search through many websites to puzzle the complete code, I thought I share my implementation:
Python layer for computing the H-matrix with weights for each class:
and the relevant part from the train_val.prototxt:
1. Is there any tutorial/example on the usage of InfogainLoss layer?:
A nice example can be found here: using InfogainLoss to tackle class imbalance.
2. Should the input to this layer, the class probabilities, be the output of a Softmax layer?
Historically, the answer used to be YES according to Yair's answer. The old implementation of
"InfogainLoss"
needed to be the output of"Softmax"
layer or any other layer that makes sure the input values are in range [0..1].The OP noticed that using
"InfogainLoss"
on top of"Softmax"
layer can lead to numerical instability. His pull request, combining these two layers into a single one (much like"SoftmaxWithLoss"
layer), was accepted and merged into the official Caffe repositories on 14/04/2017. The mathematics of this combined layer are given here.The upgraded layer "look and feel" is exactly like the old one, apart from the fact that one no longer needs to explicitly pass the input through a
"Softmax"
layer.3. How can I convert an numpy.array into a binproto file:
In python
Now you can add to the model prototext the
INFOGAIN_LOSS
layer withH
as a parameter:4. How to load
H
as part of a DATA layerQuoting Evan Shelhamer's post:
The layer is summing up
and so the p_i's need to be in (0, 1] to make sense as a loss function (otherwise higher confidence scores will produce a higher loss). See the curve below for the values of log(p).
I don't think they have to sum up to 1, but passing them through a Softmax layer will achieve both properties.