I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer
, which is the cross-entropy error with logistic activation.
Basically, the cross-entropy error for a single example with N independent targets is denoted as:
- sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] )
where t
is the target, 0 or 1, and x
is the output, indexed by i
. x
, of course goes through a logistic activation.
An algebraic trick for quicker cross-entropy calculation reduces the computation to:
-t[i] * x[i] + log(1 + exp(x[i]))
and you can verify that from Section 3 here.
The question is, how is the above translated to the loss calculating code below:
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
Thank you.
The function is reproduced below for convenience.
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// The forward pass computes the sigmoid outputs.
sigmoid_bottom_vec_[0] = bottom[0];
sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
// Compute the loss (negative log likelihood)
// Stable version of loss computation from input data
const Dtype* input_data = bottom[0]->cpu_data();
const Dtype* target = bottom[1]->cpu_data();
int valid_count = 0;
Dtype loss = 0;
for (int i = 0; i < bottom[0]->count(); ++i) {
const int target_value = static_cast<int>(target[i]);
if (has_ignore_label_ && target_value == ignore_label_) {
continue;
}
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
++valid_count;
}
normalizer_ = get_normalizer(normalization_, valid_count);
top[0]->mutable_cpu_data()[0] = loss / normalizer_;
}
In the expression
log(1 + exp(x[i]))
you might encounter numerical instability in casex[i]
is very large. To overcome this numerical instability, one scales the sigmoid function like this:Now, if you plug the new and stable expression for
sig(x)
into the loss you'll end up with the same expression as caffe is using.Enjoy!