How to create a H2OFrame using H2O REST API

2019-05-23 02:02发布

问题:

Is it possible to create a H2OFrame using the H2O's REST API and if so how?

My main objective is to utilize models stored inside H2O so as to make predictions on external H2OFrames.

I need to be able to generate those H2OFrames externally from JSON (I suppose by calling an endpoint)

I read the API documentation but couldn't find any clear explanation.

I believe that the closest endpoints are

/3/CreateFrame which creates random data and /3/ParseSetup

but I couldn't find any reliable tutorial.

回答1:

Currently there is no REST API endpoint to directly convert some JSON record into a Frame object. Thus, the only way forward for you would be to first write the data to a CSV file, then upload it to h2o using POST /3/PostFile, and then parse using POST /3/Parse.

(Note that POST /3/PostFile endpoint is not in the documentation. This is because it is handled separately from the other endpoints. Basically, it's an endpoint that takes an arbitrary file in the body of the post request, and saves it as "raw data file").

The same job is much easier to do in Python or in R: for example in order to upload some dataset into h2o for scoring, you only need to say

df = h2o.H2OFrame(plaindata)


回答2:

I am already doing something similar in my project. Since, there is no REST API endpoint to directly convert JSON record into a Frame object. So, I am doing the following: -

1- For Model Building:- first transfer and write the data into the CSV file where h2o server or cluster is running.Then import data into the h2o using POST /3/ImportFiles, and then parse and build a model etc. I am using the h2o-bindings APIs (RESTful APIs) for it. Since I have a large data (hundreds MBs to few GBs), so I use /3/ImportFiles instead POST /3/PostFile as latter is slow to upload large data.

2- For Model Scoring or Prediction:- I am using the Model MOJO and POJO. In your case, you use POST /3/PostFile as suggested by @Pasha, if your data is not large. But, as per h2o documentation, it's advisable to use the MOJO or POJO for model scoring or prediction in a production environment and not to call h2o server/cluster directly. MOJO and POJO are thread safe, so you can scale it using multithreading for concurrent requests.



标签: rest h2o