How to print the probability of prediction in Logi

2019-06-02 11:31发布

I am using Spark 1.5.1 and, In pyspark, after I fit the model using:

model = LogisticRegressionWithLBFGS.train(parsedData)

I can print the prediction using:

model.predict(p.features)

Is there a function to print the probability score also along with the prediction?

2条回答
我只想做你的唯一
2楼-- · 2019-06-02 11:52

I presume the question is on computing probability score for the predicting the entire training set. if so , I did the following to compute it. Not sure if the post is still active, but this is howI did this:

#get the original training data before it was converted to rows of LabelPoint.
#let us assume it is otd  ( of type spark DataFrame)
#let us extract the featureset as rdd by:
fs=otd.rdd.map(lambda x:x[1:]) # assuming label is col 0.

#the below is just a sample way of creating a Labelpoint rows..
parsedData= otd.rdd.map(lambda x: reg.LabeledPoint(int(x[0]-1),x[1:]))

# now convert otd to a panda DataFrame as:
ptd= otd.toPandas()
m= ptd.shape[0]
# train and get the model
model=LogisticRegressionWithLBFGS.train(trainingData,numClasses=10)


#Now store the model.predict rdd structures 
predict=model.predict(fs)
pr=predict.collect()

correct=0
correct = ((ptd.label-1) == (pr)).sum()
print((correct/m) *100)

Note the above is for multi-class classification.

查看更多
forever°为你锁心
3楼-- · 2019-06-02 12:03

You have to clear the threshold first, and this works only for binary classification:

 from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
 from pyspark.mllib.regression import LabeledPoint

 parsed_data = [LabeledPoint(0.0, [4.6,3.6,1.0,0.2]),
                LabeledPoint(0.0, [5.7,4.4,1.5,0.4]),
                LabeledPoint(1.0, [6.7,3.1,4.4,1.4]),
                LabeledPoint(0.0, [4.8,3.4,1.6,0.2]),
                LabeledPoint(1.0, [4.4,3.2,1.3,0.2])]   

 model = LogisticRegressionWithLBFGS.train(sc.parallelize(parsed_data))
 model.threshold
 # 0.5
 model.predict(parsed_data[2].features)
 # 1

 model.clearThreshold()
 model.predict(parsed_data[2].features)
 # 0.9873840020002339
查看更多
登录 后发表回答