Gradient calculation for softmax version of triple

2019-02-19 15:49发布

I have been trying to implement the softmax version of the triplet loss in Caffe described in
Hoffer and Ailon, Deep Metric Learning Using Triplet Network, ICLR 2015.

I have tried this but I am finding it hard to calculate the gradient as the L2 in exponent is not squared.

Can someone please help me here?

2条回答
2楼-- · 2019-02-19 16:14

Implementing the L2 norm using existing layers of caffe can save you all the hustle.

Here's one way to compute ||x1-x2||_2 in caffe for "bottom"s x1 and x2 (assuming x1 and x2 are B-by-C blobs, computing B norms for C dimensional diffs)

layer {
  name: "x1-x2"
  type: "Eltwise"
  bottom: "x1"
  bottom: "x1"
  top: "x1-x2"
  eltwise_param { 
    operation: SUM
    coeff: 1 coeff: -1
  }
}
layer {
  name: "sqr_norm"
  type: "Reduction"
  bottom: "x1-x2"
  top: "sqr_norm"
  reduction_param { operation: SUMSQ axis: 1 }
}
layer {
  name: "sqrt"
  type: "Power"
  bottom: "sqr_norm"
  top: "sqrt"
  power_param { power: 0.5 }
}

For the triplet loss defined in the paper, you need to compute L2 norm for x-x+ and for x-x-, concat these two blobs and feed the concat blob to a "Softmax" layer.
No need for dirty gradient computations.

查看更多
可以哭但决不认输i
3楼-- · 2019-02-19 16:28

This is a math question, but here it goes. The first equation is what you're used to, and the second is what you do when it's not squared.

Derivation of norm

查看更多
登录 后发表回答