Running h2o.automl()
returns a single model in leaderboard; however, when trying to access the actual model via @leader@model
, the following error ensues:
Error in is.H2OFrame(x) : trying to get slot "metrics" from an object of a basic class ("NULL") with no slots
As well, when calling h2o.predict()
on the leader model, got the error message:
Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = page, : ERROR MESSAGE: Object 'dummy' not found in function: predict for argument: model
Model was run in the same session using h2o
v3.20.0.2
in R
.
I think what's happening is that you're not able to train a single model in one hour, so when you try to collect the leader model, it's trying to grab an incomplete model and you get an error. You don't have very many rows, but you have a really large number of columns.
Since it's hard to predict how long the model training will take, I'd use the
max_models
argument instead of limiting by time. Since AutoML will stop when it reaches the first ofmax_models
ormax_runtime_secs
, I'd setmax_runtime_secs
to a very large number (e.g. 999999999) and then setmax_models = 10
or whatever number you like.Second, since you have very wide data, I'd recommend turning off the Random Forests and GBM models, and leaving the GLM and Deep Learning models. To do that, set
exclude_algos = c("DRF", "GBM")
. It will take a really long time to train tree-based models on 120k columns.Another good option to consider is to first apply PCA or GLRM to your data to reduce the dimensionality to <500 columns and then you can include the tree-based models in the AutoML run.