Binary cross entropy Vs categorical cross entropy

2020-04-20 15:36发布

When considering the problem of classifying an input to one of 2 classes, 99% of the examples I saw used a NN with a single output and sigmoid as their activation followed by a binary cross-entropy loss. Another option that I thought of is having the last layer produce 2 outputs and use a categorical cross-entropy with C=2 classes, but I never saw it in any example. Is there any reason for that?

Thanks

标签： python pytorch cross-entropy

1条回答

三岁会撩人

2楼-- · 2020-04-20 16:26

If you are using softmax on top of the two output network you get an output that is mathematically equivalent to using a single output with sigmoid on top.
Do the math and you'll see.

In practice, from my experience, if you look at the raw "logits" of the two outputs net (before softmax) you'll see that one is exactly the negative of the other. This is a result of the gradients pulling exactly in the opposite direction each neuron.

Therefore, since both approaches are equivalent, the single output configuration has less parameters and requires less computations, thus it is more advantageous to use a single output with a sigmoid ob top.

0人赞添加讨论(0) 举报

Binary cross entropy Vs categorical cross entropy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间