Caffe: Softmax with temperature

2019-06-28 03:06发布


I am working on implementing Hinton's Knowledge distillation paper. The first step is to store the soft targets of a "cumbersome model" with a higher temperature (i.e. I don't need to train the network, just need to do forward pass per image and store the soft targets with a temperature T).
Is there a way I can get the output of Alexnet or googlenet soft targets but with a different temperature?
I need to modify the soft-max with pi= exp(zi/T)/sum(exp(zi/T).
Need to divide the outputs of the final fully connected layer with a temperature T. I only need this for the forward pass (not for training).


I believe there are three options to solve this problem

1. Implement your own Softmax layer with a temperature parameter. It should be quite straight forward to modify the code of softmax_layer.cpp to take into account a "temperature" T. You might need to tweak the caffe.proto as well to allow for parsing Softmax layer with an extra parameter.

2. Implement the layer as a python layer.

3. If you only need a forward pass, i.e. "extracting features", then you can simply output as features the "top" of the layer before the softmax layer and do the softmax with temperature outside caffe altogether.

4. You can add Scale layer before the top Softmax layer:

layer {
  type: "Scale"
  name: "temperature"
  bottom: "zi"
  top: "zi/T"
  scale_param { 
    filler: { type: 'constant' value: 1/T }  # replace "1/T" with the actual value of 1/T.
  param { lr_mult: 0 decay_mult: 0 } # make sure temperature is fixed
layer {
  type: "Softmax"
  name: "prob"
  bottom: "zi/T"
  top: "pi"