Consisten results with Multiple runs of h2o deeple

2019-09-11 04:18发布

问题:

For a certain combination of parameters in the deeplearning function of h2o, I get different results each time I run it.

args <- list(list(hidden = c(200,200,200), 
                  loss = "CrossEntropy",  
                  hidden_dropout_ratio = c(0.1, 0.1,0.1), 
                  activation = "RectifierWithDropout",  
                  epochs = EPOCHS))

run   <- function(extra_params) {
  model <- do.call(h2o.deeplearning, 
                   modifyList(list(x = columns, y = c("Response"),  
                   validation_frame = validation, distribution = "multinomial",
                   l1 = 1e-5,balance_classes = TRUE, 
                   training_frame = training), extra_params))
}

model <- lapply(args, run) 

What would I need to do in order to get consistent results for the model each time I run this?

回答1:

Deeplearning with H2O will not be reproducible if it is run on more than a single core. The results and performance metrics may vary slightly from what you see each time you train the deep learning model. The implementation in H2O uses a technique called "Hogwild!" which increases the speed of training at the cost of reproducibility on multiple cores.

So if you want reproducible results you will need to restrict H2O to run on a single core and make sure to use a seed in the h2o.deeplearning call.

Edit based on comment by Darren Cook: I forgot to include the reproducible = TRUE parameter that needs to be set in combination with the seed to make it truly reproducible. Note that this will make it a lot slower to run. And is is not advisable to do this with a large dataset.

More information on "Hogwild!"