What do you need to watch out for when using cross

2020-05-02 18:34发布

Regarding h2o.glm lambda search not appearing to iterate over all lambdas, I read the question as complaining that lambda was too high; they tried setting early_stopping=F in the hope that might fix that "bug".

Isn't it the case that the original behaviour was a feature, not a bug? And if that is correct, then you should always use early_stopping=T when using cross-validation with GLM, otherwise the error estimate from cross-validation is useless; you also risk over-fitting.

(My main question is if my understanding of the way GLM and CV work together is correct; but I'd be interested if there are any other things to watch out for when using lambda_search and cross-validation together.)

1条回答
The star\"
2楼-- · 2020-05-02 18:48

H2O's glm with lambda search and cross-validation should always pick the best lambda based on cross-validation and use that in the returned (main) model. The early stopping option should have no effect on selected lambda. Its purpose is to skip computation of models for lambdas > best since they are not needed for the main model (we still compute models for lambdas < best since that allows to use warm starting and take full advantage of strong rules).

I think the behavior with early_stopping set to false should compute models for all lambdas in case user wants to see them / do custom model selection.

查看更多
登录 后发表回答