I am using the sigmoid cross entropy loss function for a multilabel classification problem as laid out by this tutorial. However, in both their results on the tutorial and my results, the output predictions are in the range (-Inf, Inf)
, while the range of a sigmoid is [0, 1]
. Is the sigmoid only processed in the backprop? That is, shouldn't a forward pass squash the output?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
In this example the input to the "SigmoidCrossEntropyLoss"
layer is the output of a fully-connect layer. Indeed there are no constraints on the values of the outputs of an "InnerProduct"
layer and they can be in range [-inf, inf]
.
However, if you examine carefully the "SigmoidCrossEntropyLoss"
you'll notice that it includes a "Sigmoid"
layer inside -- to ensure stable gradient estimation.
Therefore, at test time, you should replace the "SigmoidCrossEntropyLoss"
with a simple "Sigmoid"
layer to output per-class predictions.