What is cross-entropy?

I know that there are a lot of explanations of what cross-entropy is, but I'm still confused.

Is it only a method to describe the loss function? Then, we can use, for example, gradient descent algorithm to find the minimum. Or it's the whole process that involves also finding the minimum algorithm?

标签： machine-learning cross-entropy

1条回答

三岁会撩人

2楼-- · 2019-01-29 17:30

Cross-entropy is commonly used to quantify the difference between two probability distributions. Usually the "true" distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution.

For example, suppose for a specific training instance, the label is B (out of the possible labels A, B, and C). The one-hot distribution for this training instance is therefore:

Pr(Class A)  Pr(Class B)  Pr(Class C)
        0.0          1.0          0.0

You can interpret the above "true" distribution to mean that the training instance has 0% probability of being class A, 100% probability of being class B, and 0% probability of being class C.

Now, suppose your machine learning algorithm predicts the following probability distribution:

Pr(Class A)  Pr(Class B)  Pr(Class C)
      0.228        0.619        0.153

How close is the predicted distribution to the true distribution? That is what the cross-entropy loss determines. Use this formula:

Where p(x) is the wanted probability, and q(x) the actual probability. The sum is over the three classes A, B, and C. In this case the loss is 0.479 :

H = - (0.0*ln(0.228) + 1.0*ln(0.619) + 0.0*ln(0.153)) = 0.479

So that is how "wrong" or "far away" your prediction is from the true distribution.

Cross entropy is one out of many possible loss functions (another popular one is SVM hinge loss). These loss functions are typically written as J(theta) and can be used within gradient descent, which is an iterative framework of moving the parameters (or coefficients) towards the optimum values. In the equation below, you would replace J(theta) with H(p, q). But note that you need to compute the derivative of H(p, q) with respect to the parameters first.

So to answer your original questions directly:

Is it only a method to describe the loss function?

Correct, cross-entropy describes the loss between two probability distributions. It is one of many possible loss functions.

Then we can use, for example, gradient descent algorithm to find the minimum.

Yes, the cross-entropy loss function can be used as part of gradient descent.

Further reading: one of my other answers related to TensorFlow.

0人赞添加讨论(0) 举报

What is cross-entropy?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间