Save Apache Spark mllib model in python [duplicate

2020-03-03 05:47发布

I am trying to save a fitted model to a file in Spark. I have a Spark cluster which trains a RandomForest model. I would like to save and reuse the fitted model on another machine. I read some posts on the web which recommends to do java serialization. I am doing the equivalent in python but it does not work. What is the trick?

model = RandomForest.trainRegressor(trainingData, categoricalFeaturesInfo={},
                                    numTrees=nb_tree,featureSubsetStrategy="auto",
                                    impurity='variance', maxDepth=depth)
output = open('model.ml', 'wb')
pickle.dump(model,output)

I am getting this error:

TypeError: can't pickle lock objects

I am using Apache Spark 1.2.0.

标签： python pyspark apache-spark-mllib

1条回答

Explosion°爆炸

2楼-- · 2020-03-03 06:34

If you look at the source code, you'll see that the RandomForestModel inherits from the TreeEnsembleModel which in turn inherits from JavaSaveable class that implements the save() method, so you can save your model like in the example below:

model.save([spark_context], [file_path])

So it will save the model into the file_path using the spark_context. You cannot use (at least until now) the Python nativle pickle to do that. If you really want to do that, you'll need to implement the methods __getstate__ or __setstate__ manually. See this pickle documentation for more information.

0人赞添加讨论(0) 举报

Save Apache Spark mllib model in python [duplicate

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间