I would like to apply SMOTE to unbalanced dataset which contains binary, categorical and continuous data. Is there a way to apply SMOTE to binary and categorical data?
相关问题
- Django __str__ returned non-string (type NoneType)
- How to postpone/defer the evaluation of f-strings?
- ImportError shows up with py.test, but not when ru
- Comparing pd.Series and getting, what appears to b
- Django Attribute error 'datetime.timedelta'
相关文章
- Airflow depends_on_past explanation
- Raspberry Pi-Python: Install Pandas on Python 3.5.
- Numpy array to TFrecord
- How to split a DataFrame in pandas in predefined p
- Error following env.render() for OpenAI
- AttributeError: 'Series' object has no att
- ImportError: cannot import name 'joblib' f
- How to save a file downloaded from requests to ano
So as per documentation SMOTE doesn't support Categorical data in Python yet, and provides continuous outputs.
You can instead employ a workaround where you convert the categorical variables to integers and use SMOTE.
Then use
np.round(X_train[categorical_variables])
to convert them back to the respective categorical values.As per the documentation, this is now possible with the use of SMOTENC. SMOTE-NC is capable of handling a mix of categorical and continuous features.
Here is the code from the documentation
from imblearn.over_sampling import SMOTENC smote_nc = SMOTENC(categorical_features=[0, 2], random_state=0) X_resampled, y_resampled = smote_nc.fit_resample(X, y)
As of Jan, 2018 this issue has not been implemened in Python. Following is a reference from the team. Infact they are open to proposals if someone wants to implement it.
For those with an academic interest in this ongoing issue, the paper from Chawla & Bowyer addresses this SMOTE-Non Continuous sampling problem in section 6.1.
Update: This feature has been implemented as of 21 Oct, 2018. Service request stands closed now.