what is the different between h2o.ensemble and h2o

2020-07-15 09:38发布

问题:

Accoding to the Description of function:

h2o.stack: This function creates a "Super Learner" (stacking) ensemble using a list of existing H2O base models specified by the user.

h2o.ensemble: This function creates a "Super Learner" (stacking) ensemble using the H2O base learning algorithms specified by the user.

回答1:

They are two different ways to construct an ensemble. They have a different interface, but they produce the exact same type of object in the end.

The h2o.stack() function takes as input a list of already trained (and cross-validated) H2O models, so all it needs to do is the metalearning (combiner) step, which is very fast. This is useful if you want to use a grid of H2O models or a collection of grids of H2O models as the base learners. The only caveat is that all the base learners must have used identical cross-validation folds. If you use fold_assignment = "Modulo" in all the base learners (or grid) that will ensure identical folds.
The h2o.ensemble() function allows the user to specify which base models they want in the ensemble and then does the all of the training and cross-validation of the base models, and then does the metalearning (combiner) step as well. This takes much longer since it has to train all the base models as well.

As of the latest stable release (H2O 3.10.3.*), stacking is now available natively in H2O (R, Python, Java, Scala) as the "Stacked Ensemble" method. More info on that here. However, the h2oEnsemble R package (where the h2o.ensemble() and h2o.stack() functions live) will continue to be supported as well.