I'm dealing with an imbalanced dataset and want to do a grid search to tune my model's parameters using scikit's gridsearchcv. To oversample the data, I want to use SMOTE, and I know I can include that as a stage of a pipeline and pass it to gridsearchcv. My concern is that I think smote will be applied to both train and validation folds, which is not what you are supposed to do. The validation set should not be oversampled. Am I right that the whole pipeline will be applied to both dataset splits? And if yes, how can I turn around this? Thanks a lot in advance
相关问题
- How to conditionally scale values in Keras Lambda
- Trying to understand Pytorch's implementation
- ParameterError: Audio buffer is not finite everywh
- Convert Python dictionary to Word2Vec object
- How to find beta values in Logistic Regression mod
相关文章
- what is the difference between transformer and est
- ValueError: Unknown label type: 'continuous
- How to use cross_val_score with random_state
- Python loading old version of sklearn
- How to measure overfitting when train and validati
- McNemar's test in Python and comparison of cla
- How to disable keras warnings?
- Invert MinMaxScaler from scikit_learn
Yes, it can be done, but with imblearn Pipeline.
You see, imblearn has its own Pipeline to handle the samplers correctly. I described this in a similar question here.
When called
predict()
on aimblearn.Pipeline
object, it will skip the sampling method and leave the data as it is to be passed to next transformer. You can confirm that by looking at the source code here:So for this to work correctly, you need the following:
Fill the details as necessary, and the pipeline will take care of the rest.