Google Cloud ML-engine supports the ability to deploy scikit-learn Pipeline
objects. For example a text classification Pipeline
could look like the following,
classifier = Pipeline([
('vect', CountVectorizer()),
('clf', naive_bayes.MultinomialNB())])
The classifier can be trained,
classifier.fit(train_x, train_y)
Then the classifier can be uploaded to Google Cloud Storage,
model = 'model.joblib'
joblib.dump(classifier, model)
model_remote_path = os.path.join('gs://', bucket_name, datetime.datetime.now().strftime('model_%Y%m%d_%H%M%S'), model)
subprocess.check_call(['gsutil', 'cp', model, model_remote_path], stderr=sys.stdout)
Then a Model
and Version
can be created, either through the Google Cloud Console, or programmatically, linking the 'model.joblib'
file to the Version
.
This classifier can then be used to predict new data by calling the deployed model predict
endpoint,
ml = discovery.build('ml','v1')
project_id = 'projects/{}/models/{}'.format(project_name, model_name)
if version_name is not None:
project_id += '/versions/{}'.format(version_name)
request_dict = {'instances':['Test data']}
ml_request = ml.projects().predict(name=project_id, body=request_dict).execute()
The Google Cloud ML-engine calls the predict
function of the classifier and returns the predicted class. However, I would like to be able to return the confidence score. Normally this could be achieved by calling the predict_proba
function of the classier, however there doesn't seem to be the option to change the called function. My question is: Is it possible to return the confidence score for a scikit-learn classifier when using the Google Cloud ML-engine? If not, would you have any recommendations as to how else to achieve this result?
Update:
I've found a hacky solution. It involved overwriting the predict
function of the classifier with its own predict_proba
function,
nb = naive_bayes.MultinomialNB()
nb.predict = nb.predict_proba
classifier = Pipeline([
('vect', CountVectorizer()),
('clf', nb)])
Surprisingly this works. If anyone knows of a neater solution then please let me know.
Update: Google have released a new feature (currently in beta) called Custom prediction routines
. This allows you to define what code is run when a prediction request comes in. It adds more code to the solution, but it certainly less hacky.