How to generate tuples of (original label, predict

I am trying to make predictions with the model that I got back from MLlib on Spark. The goal is to generate tuples of (orinalLabelInData, predictedLabel). Then those tuples can be used for model evaluation purpose. What is the best way to achieve this? Thanks.

Assuming parsedTrainData is a RDD of LabeledPoint

from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree, DecisionTreeModel
from pyspark.mllib.util import MLUtils

parsedTrainData = sc.parallelize([LabeledPoint(1.0, [11.0,-12.0,23.0]), 
                                  LabeledPoint(3.0, [-1.0,12.0,-23.0])])

model = DecisionTree.trainClassifier(parsedTrainData, numClasses=7,
categoricalFeaturesInfo={}, impurity='gini', maxDepth=8, maxBins=32)

model.predict(parsedTrainData.map(lambda x: x.features)).take(1)

This gives back the predictions, but I am not sure how to match each prediction back to the original labels in data.

I tried

parsedTrainData.map(lambda x: (x.label, dtModel.predict(x.features))).take(1)

however, it seems like my way of sending model to worker is not a valid thing to do here

/spark140/python/pyspark/context.pyc in __getnewargs__(self)
    250         # This method is called when attempting to pickle SparkContext, which is always an error:
    251         raise Exception(
--> 252             "It appears that you are attempting to reference SparkContext from a broadcast "
    253             "variable, action, or transforamtion. SparkContext can only be used on the driver, "
    254             "not in code that it run on workers. For more information, see SPARK-5063."

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

标签： python apache-spark apache-spark-mllib

1条回答

老娘就宠你

2楼-- · 2019-02-16 00:27

Well, according to the official documentation you can simply zip predictions and labels like this:

predictions = model.predict(parsedTrainData.map(lambda x: x.features))
labelsAndPredictions = parsedTrainData.map(lambda x: x.label).zip(predictions)

0人赞添加讨论(0) 举报

How to generate tuples of (original label, predict

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间