Why use softmax as opposed to standard normalizati-第2页回答

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

enter image description here

This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are positive, and then normalise just by dividing all outputs by the sum of all outputs?

标签： math neural-network softmax

9条回答

时光不老，我们不散

2楼-- · 2019-01-16 01:03

Adding to Piotr Czapla answer, the greater the input values, the greater the probability for the maximum input, for same proportion and compared to the other inputs:

0人赞添加讨论(0) 举报

神经病院院长

3楼-- · 2019-01-16 01:09

I have found the explanation here to be very good: CS231n: Convolutional Neural Networks for Visual Recognition.

On the surface the softmax algorithm seems to be a simple non-linear (we are spreading the data with exponential) normalization. However, there is more than that.

Specifically there are a couple different views (same link as above):

Information Theory - from the perspective of information theory the softmax function can be seen as trying to minimize the cross-entropy between the predictions and the truth.
Probabilistic View - from this perspective we are in fact looking at the log-probabilities, thus when we perform exponentiation we end up with the raw probabilities. In this case the softmax equation find the MLE (Maximum Likelihood Estimate)

In summary, even though the softmax equation seems like it could be arbitrary it is NOT. It is actually a rather principled way of normalizing the classifications to minimize cross-entropy/negative likelihood between predictions and the truth.

0人赞添加讨论(0) 举报

甜甜的少女心

4楼-- · 2019-01-16 01:09

We are looking at a multi-classification problem. The predicted variable y can take one of k values, where k > 2. In probability, this satisfies a multinomial distribution, and multinomial distribution belongs to a big family called exponential family. According to the property of exponential family distributions, we can reconstruct the probability of P(k=?|x), it coincides with the softmax formula.

For further information and a formal proof reference CS229 lecture notes (Softmax Regression).

A useful trick usually perform to softmax: softmax(x) = softmax(x+c), that is, softmax is invariant to constant offsets in the input.

0人赞添加讨论(0) 举报

上一页 1 2

Why use softmax as opposed to standard normalizati

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间