I'm trying a random forest classification model by using H2O
library inside R on a training set having 70 million rows and 25 numeric features.The total file size is 5.6 GB.
The validation file's size is 1 GB.
I have 16 GB RAM and 8 core CPU on my system.
The system successfully able to read both of the files in H2O object.
Then I'm giving below command to build the model:
model <- h2o.randomForest(x = c(1:18,20:25), y = 19, training_frame = traindata,
validation_frame = testdata, ntrees = 150, mtries = 6)
But after few minutes (without generating any tree), I'm getting following error:
"Error in .h2o.doSafeREST(conn = conn, h2oRestApiVersion = h2oRestApiVersion, : Unexpected CURL error: Recv failure: Connection reset by peer"
However If I tried above code with 1 tree, its running successfully.
Is the above error occurring because of memory issue? Any help will be appreciated.
Its an OutOfMemoryError. A variation of this error message on the R side is:
Checking the h2o server logs, which you should do as well, will tell you:
I am running this on h2o Slater (3.2.0.5), so depending on your version, this may vary.
Probably you're out of memory. Try looking on system's memory usage during forest growing. Also try to launch training directly from H2O web console (http://localhost:54321/ by default), may be it will give more detailed error.