I trained a classification model in Apache Spark (using pyspark
). I stored the model in an object, LogisticRegressionModel
. Now, I want to make predictions on new data. I would like to store the model, and read it back into a new program in order to make the predictions. Any idea how to store the model? I'm thinking of maybe pickle, but I'm a newbie to both python and Spark, so I'd like to hear what the community thinks.
UPDATE: I also needed a decision tree classifier. To read it, I needed to import DecisionTreeModel from pyspark.mllib.tree import DecisionTree, DecisionTreeModel