Random forest is a robust algorithm. In Random Forest, it trains several small trees and have OOB accuracy. However, is it necessary to run cross-validation with random forest at the same time ?
问题:
回答1:
OOB error is an unbiased estimate of the error for random forests, so that's great. But what are you using the cross validation for? If you are comparing the RF against some other algorithm that isn't using bagging in the same way, you want a low variance way to compare them. You have to use cross validation anyway to support the other algorithm. Then using the cross validation sample splits for the RF and the other algorithm is still a good idea, so that you get rid of the variance caused by the split selection.
If you are comparing one RF against another RF with a different feature set, then comparing OOB errors is reasonable. This is especially true if you make sure both RFs use the same bagging sets during training.
回答2:
You do not need to perform any kind of validation. If you just want to use it, and don't care about the risk of overfitting.
For scientific publishing (or anything else, where you compare the quality of different classifiers), you should validate your results, and cross validation is a best practise here.