Need help understanding the Caffe code for Sigmoid

2019-08-07 07:03发布

I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer, which is the cross-entropy error with logistic activation.

Basically, the cross-entropy error for a single example with N independent targets is denoted as:

 - sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] ) 

where t is the target, 0 or 1, and x is the output, indexed by i. x, of course goes through a logistic activation.

An algebraic trick for quicker cross-entropy calculation reduces the computation to:

 -t[i] * x[i] + log(1 + exp(x[i])) 

and you can verify that from Section 3 here.

The question is, how is the above translated to the loss calculating code below:

   loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
        log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));

Thank you.

The function is reproduced below for convenience.

   template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      // The forward pass computes the sigmoid outputs.                                                                                                                                                                                    
      sigmoid_bottom_vec_[0] = bottom[0];
      sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
      // Compute the loss (negative log likelihood)                                                                                                                                                                                        
      // Stable version of loss computation from input data                                                                                                                                                                                
      const Dtype* input_data = bottom[0]->cpu_data();
      const Dtype* target = bottom[1]->cpu_data();
      int valid_count = 0;
      Dtype loss = 0;
      for (int i = 0; i < bottom[0]->count(); ++i) {
        const int target_value = static_cast<int>(target[i]);
        if (has_ignore_label_ && target_value == ignore_label_) {
          continue;
        }
        loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
            log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
        ++valid_count;
      }
      normalizer_ = get_normalizer(normalization_, valid_count);
      top[0]->mutable_cpu_data()[0] = loss / normalizer_;
    }

1条回答
爷、活的狠高调
2楼-- · 2019-08-07 07:55

In the expression log(1 + exp(x[i])) you might encounter numerical instability in case x[i] is very large. To overcome this numerical instability, one scales the sigmoid function like this:

 sig(x) = exp(x)/(1+exp(x)) 
        = [exp(x)*exp(-x(x>=0))]/[(1+exp(x))*exp(-x(x>=0))]

Now, if you plug the new and stable expression for sig(x) into the loss you'll end up with the same expression as caffe is using.

Enjoy!

查看更多
登录 后发表回答