Why Is It Not To Approach Classification Through R

A machine learning material said that it's a bad idea to approach classification problem through regression. But I think it's always possible to do a continuous regression to fit the data and truncate the continuous prediction to yield discrete classification. So why it's a bad idea?

标签： machine-learning

1条回答

姐就是有狂的资本

2楼-- · 2019-04-10 18:15

If you are doing classification, you want to optimize something related to misclassifications. You only care about predicting the right class. When you are doing regression, you want to minimize some measure of distortion between the prediction and the actual value. Mean squared error is a common penalty function for regression.

Imagine optimizing the parameters of your regressor that is eventually going to do classification. In comes a an example that is obviously class 1, but whose label is very, very large. In order to minimize the loss on this example, you have to shift your weights a lot to make the prediction extreme for this example. However, now your classification border just moved a lot, hurting your classification accuracy. You over-compensated when you didn't need to.

You can view this graph as the amount you'll move your weights as a function of how you mis-predicted an example.

Loss function plot

Most of the loss functions here are upperbounds on the misclassification loss. Models that optimize upperbounds on misclassification do classification well. Using regression for classification is akin to picking the squared error loss, and essentially mis-representing what you want to optimize. This corresponds to the upward shift toward the right side of the graph in the loss for squared error, even as the classification is becoming more and more confident, and the good classification loss functions are all either 0 or going there.

Image taken from the excellent Elements of Statistical Learning Theory.

0人赞添加讨论(0) 举报

Why Is It Not To Approach Classification Through R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间