I've unbalanced labels. That is, in binary classifier, I've more positives (1) data and less negatives (0) data. I'm using Stratified K Fold Cross Validation and getting true negatives as zero. Could you please let me know what options I have to get a value greater tan zero for true negatives?
标签:
machine-learning
相关问题
- How to conditionally scale values in Keras Lambda
- Trying to understand Pytorch's implementation
- ParameterError: Audio buffer is not finite everywh
- How to calculate logistic regression accuracy
- How to parse unstructured table-like data?
相关文章
- How to use cross_val_score with random_state
- How to measure overfitting when train and validati
- McNemar's test in Python and comparison of cla
- How to disable keras warnings?
- Invert MinMaxScaler from scikit_learn
- How should I vectorize the following list of lists
- ValueError: Unknown metric function when using cus
- F1-score per class for multi-class classification
There are quite a lot of strategies for dealing with imbalanced classes.
First, let's understand what is (probably) happening. You are asking your classifier to maximize accuracy: that is, the fraction of records that were correctly classified. If, say, 85% of the records are in Class A, then you will get 85% accuracy by just labelling everything as Class A. And this seems to be the best the classifier can achieve.
So, how can you correct for this?
1) You can try training you model on a balanced sub-set of your data. Randomly sample from the majority class only a number of records equal to those present in the minority class. This won't allow your classifier to get away with labelling everything as the majority class. But it will come at the cost of having less information available to discover the structure of the class boundary.
2) Use a different optimization metric than accuracy. Popular choices are AUC or F1 Score
3) Create an ensemble of classifiers using method 1. Each classifier will see a subset of the data and 'vote' on a class, possibly with some confidence score. Each of these classifier outputs will be a feature for a final meta-classifier (possibly build using method 2). This way you can get access to all of the information available.
This is far from an exhaustive list of solutions. Working with imbalanced (or 'skewed') datasets could be an entire text book. I would recommend reading some papers on this topic. Perhaps starting here