All binary predictors in a classification task

2019-07-29 10:10发布

问题:

I am performing my analysis using R, I will be implementing four algorithms.

1. RF
2. Log Reg
3. SVM
4. LDA

I have 50 predictors and 1 target variable. All my predictors and target variable are only binary numbers 0s and 1s.

I have the following questions:

Should I convert them all into factors?
Converting them into factors, and applying RF algorithms give 100% accuracy, I am very much surprised to see that as well.
Also, for other algorithms, how should i treat my variables priorly, before feeding them into my other algorithms.

Thanks

回答1:

If you variables / predictors are categorical, then it is best to convert them to factors. Otherwise, it is likely they will be treated as numerical values.

If you are doing a classification task, then best to have the target / response variable as a factor as well.

It is also better to look at the documentation of the functions you use to make sure they will not convert factors to numerical values.



回答2:

Use adaboost...

Take a look at different kaggle kernels, especially the Mercedes one, to get the idea of implementing adaboost.

https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/kernels

The dataset is mixed of both numerical and factors and 0s,1s.