Oversampling: SMOTE for binary and categorical dat

2019-05-07 00:21发布

I would like to apply SMOTE to unbalanced dataset which contains binary, categorical and continuous data. Is there a way to apply SMOTE to binary and categorical data?

标签： python-3.x imputation

3条回答

戒情不戒烟

2楼-- · 2019-05-07 00:31

So as per documentation SMOTE doesn't support Categorical data in Python yet, and provides continuous outputs.

You can instead employ a workaround where you convert the categorical variables to integers and use SMOTE.

Then use np.round(X_train[categorical_variables]) to convert them back to the respective categorical values.

0人赞添加讨论(0) 举报

Ridiculous、

3楼-- · 2019-05-07 00:34

As per the documentation, this is now possible with the use of SMOTENC. SMOTE-NC is capable of handling a mix of categorical and continuous features.

Here is the code from the documentation

from imblearn.over_sampling import SMOTENC smote_nc = SMOTENC(categorical_features=[0, 2], random_state=0) X_resampled, y_resampled = smote_nc.fit_resample(X, y)

0人赞添加讨论(0) 举报

霸刀☆藐视天下

4楼-- · 2019-05-07 00:46

As of Jan, 2018 this issue has not been implemened in Python. Following is a reference from the team. Infact they are open to proposals if someone wants to implement it.

For those with an academic interest in this ongoing issue, the paper from Chawla & Bowyer addresses this SMOTE-Non Continuous sampling problem in section 6.1.

Update: This feature has been implemented as of 21 Oct, 2018. Service request stands closed now.

0人赞添加讨论(0) 举报

Oversampling: SMOTE for binary and categorical dat

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间