Why is softmax function necessory? Why not simple

2019-08-07 12:01发布

I am not familiar with deep learning so this might be a beginner question. In my understanding, softmax function in Multi Layer Perceptrons is in charge of normalization and distributing probability for each class. If so, why don't we use the simple normalization?

Let's say, we get a vector x = (10 3 2 1) applying softmax, output will be y = (0.9986 0.0009 0.0003 0.0001).

Applying simple normalization (dividing each elements by the sum(16)) output will be y = (0.625 0.1875 0.125 0.166).

It seems like simple normalization could also distribute the probabilities. So, what is the advantage of using softmax function on the output layer?

标签： neural-network deep-learning softmax

2条回答

手持菜刀，她持情操

2楼-- · 2019-08-07 12:38

Normalization does not always produce probabilities, for example, it doesn't work when you consider negative values. Or what if the sum of the values is zero?

But using exponential of the logits changes that, it is in theory never zero, and it can map the full range of the logits into probabilities. So it is preferred because it actually works.

0人赞添加讨论(0) 举报

可以哭但决不认输i

3楼-- · 2019-08-07 13:02

This depends on the training loss function. Many models are trained with a log loss algorithm, so that the values you see in that vector estimate the log of each probability. Thus, SoftMax is merely converting back to linear values and normalizing.

The empirical reason is simple: SoftMax is used where it produces better results.

0人赞添加讨论(0) 举报

Why is softmax function necessory? Why not simple

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间