Although both of the above methods provide better score for better closeness of prediction, still cross-entropy is preferred. Is it in every cases or there are some peculiar scenarios where we prefer cross-entropy over MSE?
相关问题
- How to conditionally scale values in Keras Lambda
- neural network does not learn (loss stays the same
- Trying to understand Pytorch's implementation
- Convolutional Neural Network seems to be randomly
- How to convert Onnx model (.onnx) to Tensorflow (.
相关文章
- how to flatten input in `nn.Sequential` in Pytorch
- How to use cross_val_score with random_state
- Looping through training data in Neural Networks B
- Why does this Keras model require over 6GB of memo
- How to measure overfitting when train and validati
- McNemar's test in Python and comparison of cla
- How to disable keras warnings?
- Invert MinMaxScaler from scikit_learn
If you do logistic regression for example, you will use the sigmoid function to estimate de probability, the cross entropy as the loss function and gradient descent to minimize it. Doing this but using MSE as the loss function might lead to a non-convex problem where you might find local minima. Using cross entropy will lead to a convex problem where you might find the optimum solution.
https://www.youtube.com/watch?v=rtD0RvfBJqQ&list=PL0Smm0jPm9WcCsYvbhPCdizqNKps69W4Z&index=35
There is also an interesting analysis here: https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/
When you derive the cost function from the aspect of probability and distribution, you can observe that MSE happens when you assume the error follows Normal Distribution and cross entropy when you assume binomial distribution. It means that implicitly when you use MSE, you are doing regression (estimation) and when you use CE, you are doing classification. Hope it helps a little bit.
Cross-entropy is prefered for classification, while mean squared error is one of the best choices for regression. This comes directly from the statement of the problems itself - in classification you work with very particular set of possible output values thus MSE is badly defined (as it does not have this kind of knowledge thus penalizes errors in incompatible way). To better understand the phenomena it is good to follow and understand the relations between
You will notice that both can be seen as a maximum likelihood estimators, simply with different assumptions about the dependent variable.