Need help understanding the Caffe code for Sigmoid

I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer, which is the cross-entropy error with logistic activation.

Basically, the cross-entropy error for a single example with N independent targets is denoted as:

 - sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] )

where t is the target, 0 or 1, and x is the output, indexed by i. x, of course goes through a logistic activation.

An algebraic trick for quicker cross-entropy calculation reduces the computation to:

 -t[i] * x[i] + log(1 + exp(x[i]))

and you can verify that from Section 3 here.

The question is, how is the above translated to the loss calculating code below:

   loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
        log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));

Thank you.

The function is reproduced below for convenience.

   template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      // The forward pass computes the sigmoid outputs.                                                                                                                                                                                    
      sigmoid_bottom_vec_[0] = bottom[0];
      sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
      // Compute the loss (negative log likelihood)                                                                                                                                                                                        
      // Stable version of loss computation from input data                                                                                                                                                                                
      const Dtype* input_data = bottom[0]->cpu_data();
      const Dtype* target = bottom[1]->cpu_data();
      int valid_count = 0;
      Dtype loss = 0;
      for (int i = 0; i < bottom[0]->count(); ++i) {
        const int target_value = static_cast<int>(target[i]);
        if (has_ignore_label_ && target_value == ignore_label_) {
          continue;
        }
        loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
            log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
        ++valid_count;
      }
      normalizer_ = get_normalizer(normalization_, valid_count);
      top[0]->mutable_cpu_data()[0] = loss / normalizer_;
    }

标签： deep-learning caffe logistic-regression cross-entropy

1条回答

爷、活的狠高调

2楼-- · 2019-08-07 07:55

In the expression log(1 + exp(x[i])) you might encounter numerical instability in case x[i] is very large. To overcome this numerical instability, one scales the sigmoid function like this:

 sig(x) = exp(x)/(1+exp(x)) 
        = [exp(x)*exp(-x(x>=0))]/[(1+exp(x))*exp(-x(x>=0))]

Now, if you plug the new and stable expression for sig(x) into the loss you'll end up with the same expression as caffe is using.

Enjoy!

0人赞添加讨论(0) 举报

Need help understanding the Caffe code for Sigmoid

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间