Unbalanced labels - Better results in Confusion Ma

2019-09-14 17:23发布

I've unbalanced labels. That is, in binary classifier, I've more positives (1) data and less negatives (0) data. I'm using Stratified K Fold Cross Validation and getting true negatives as zero. Could you please let me know what options I have to get a value greater tan zero for true negatives?

标签： machine-learning

1条回答

姐就是有狂的资本

2楼-- · 2019-09-14 17:50

There are quite a lot of strategies for dealing with imbalanced classes.

First, let's understand what is (probably) happening. You are asking your classifier to maximize accuracy: that is, the fraction of records that were correctly classified. If, say, 85% of the records are in Class A, then you will get 85% accuracy by just labelling everything as Class A. And this seems to be the best the classifier can achieve.

So, how can you correct for this?

1) You can try training you model on a balanced sub-set of your data. Randomly sample from the majority class only a number of records equal to those present in the minority class. This won't allow your classifier to get away with labelling everything as the majority class. But it will come at the cost of having less information available to discover the structure of the class boundary.

2) Use a different optimization metric than accuracy. Popular choices are AUC or F1 Score

3) Create an ensemble of classifiers using method 1. Each classifier will see a subset of the data and 'vote' on a class, possibly with some confidence score. Each of these classifier outputs will be a feature for a final meta-classifier (possibly build using method 2). This way you can get access to all of the information available.

This is far from an exhaustive list of solutions. Working with imbalanced (or 'skewed') datasets could be an entire text book. I would recommend reading some papers on this topic. Perhaps starting here

0人赞添加讨论(0) 举报

Unbalanced labels - Better results in Confusion Ma

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间