A machine learning material said that it's a bad idea to approach classification problem through regression. But I think it's always possible to do a continuous regression to fit the data and truncate the continuous prediction to yield discrete classification. So why it's a bad idea?
标签:
machine-learning
相关问题
- How to conditionally scale values in Keras Lambda
- Trying to understand Pytorch's implementation
- ParameterError: Audio buffer is not finite everywh
- How to calculate logistic regression accuracy
- How to parse unstructured table-like data?
相关文章
- How to use cross_val_score with random_state
- How to measure overfitting when train and validati
- McNemar's test in Python and comparison of cla
- How to disable keras warnings?
- Invert MinMaxScaler from scikit_learn
- How should I vectorize the following list of lists
- ValueError: Unknown metric function when using cus
- F1-score per class for multi-class classification
If you are doing classification, you want to optimize something related to misclassifications. You only care about predicting the right class. When you are doing regression, you want to minimize some measure of distortion between the prediction and the actual value. Mean squared error is a common penalty function for regression.
Imagine optimizing the parameters of your regressor that is eventually going to do classification. In comes a an example that is obviously class 1, but whose label is very, very large. In order to minimize the loss on this example, you have to shift your weights a lot to make the prediction extreme for this example. However, now your classification border just moved a lot, hurting your classification accuracy. You over-compensated when you didn't need to.
You can view this graph as the amount you'll move your weights as a function of how you mis-predicted an example.
Most of the loss functions here are upperbounds on the misclassification loss. Models that optimize upperbounds on misclassification do classification well. Using regression for classification is akin to picking the squared error loss, and essentially mis-representing what you want to optimize. This corresponds to the upward shift toward the right side of the graph in the loss for squared error, even as the classification is becoming more and more confident, and the good classification loss functions are all either 0 or going there.
Image taken from the excellent Elements of Statistical Learning Theory.