I trained a classification model in Apache Spark (using pyspark
). I stored the model in an object, LogisticRegressionModel
. Now, I want to make predictions on new data. I would like to store the model, and read it back into a new program in order to make the predictions. Any idea how to store the model? I'm thinking of maybe pickle, but I'm a newbie to both python and Spark, so I'd like to hear what the community thinks.
UPDATE: I also needed a decision tree classifier. To read it, I needed to import DecisionTreeModel from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
You can save your model by using the save method of
mllib
models.After storing it you can load it in another application.
As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).