Why Is It Not To Approach Classification Through R

A machine learning material said that it's a bad idea to approach classification problem through regression. But I think it's always possible to do a continuous regression to fit the data and truncate the continuous prediction to yield discrete classification. So why it's a bad idea?

If you are doing classification, you want to optimize something related to misclassifications. You only care about predicting the right class. When you are doing regression, you want to minimize some measure of distortion between the prediction and the actual value. Mean squared error is a common penalty function for regression.

Imagine optimizing the parameters of your regressor that is eventually going to do classification. In comes a an example that is obviously class 1, but whose label is very, very large. In order to minimize the loss on this example, you have to shift your weights a lot to make the prediction extreme for this example. However, now your classification border just moved a lot, hurting your classification accuracy. You over-compensated when you didn't need to.

You can view this graph as the amount you'll move your weights as a function of how you mis-predicted an example.

Most of the loss functions here are upperbounds on the misclassification loss. Models that optimize upperbounds on misclassification do classification well. Using regression for classification is akin to picking the squared error loss, and essentially mis-representing what you want to optimize. This corresponds to the upward shift toward the right side of the graph in the loss for squared error, even as the classification is becoming more and more confident, and the good classification loss functions are all either 0 or going there.

Image taken from the excellent Elements of Statistical Learning Theory.