Size of sample in Random Forest Regression

If understand correctly, when Random Forest estimators are calculated usually bootstrapping is applied, which means that a tree(i) is built only using data from sample(i), chosen with replacement. I want to know what is the size of the sample that sklearn RandomForestRegressor uses.

The only thing that I see that is close:

bootstrap : boolean, optional (default=True)
    Whether bootstrap samples are used when building trees.

But there is no way to specify the size or proportion of the sample size, nor does it tell me about the default sample size.

I feel like there should be way to at least know what the default sample size is, what am I missing?

标签： python machine-learning scikit-learn random-forest

3条回答

甜甜的少女心

2楼-- · 2020-07-09 09:11

The sample size for bootstrap is always the number of samples.

You are not missing anything, the same question was asked on the mailing list for RandomForestClassifier:

The bootstrap sample size is always the same as the input sample size. If you feel up to it, a pull request updating the documentation would probably be quite welcome.

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2020-07-09 09:26

Uhh, I agree with you it's quite strange that we cannot specify the subsample/bootstrap size in RandomForestRegressor algo. Maybe a potential workaround is to use BaggingRegressor instead. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html#sklearn.ensemble.BaggingRegressor

RandomForestRegressor is just a special case of BaggingRegressor (use bootstraps to reduce the variance of a set of low-bias-high-variance estimators). In RandomForestRegressor, the base estimator is forced to be DeceisionTree, whereas in BaggingRegressor, you have the freedom to choose the base_estimator. More importantly, you can set your customized subsample size, for example max_samples=0.5 will draw random subsamples with size equal to half of the entire training set. Also, you can choose just a subset of features by setting max_features and bootstrap_features.

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2020-07-09 09:30

In the 0.22 version of scikit-learn, the max_samples option has been added, doing what you asked : here the documentation of the class.

0人赞添加讨论(0) 举报

Size of sample in Random Forest Regression

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间