I have logistic regression mode, where I explicitly set the threshold to 0.5.
model.setThreshold(0.5)
I train the model and then I want to get basic stats -- precision, recall etc.
This is what I do when I evaluate the model:
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold
precision.foreach { case (t, p) =>
println(s"Threshold is: $t, Precision is: $p")
}
I get results with only 0.0 and 1.0 as values of threshold and 0.5 is completely ignored.
Here is the output of the above loop:
Threshold is: 1.0, Precision is: 0.8571428571428571
Threshold is: 0.0, Precision is: 0.3005181347150259
When I call metrics.thresholds() it also returns only two values, 0.0 and 1.0.
How do I get the precision and recall values with threshold as 0.5?
You need to clear the model threshold before you make predictions. Clearing threshold makes your predictions return a score and not the classified label. If not you will only have two thresholds, i.e. your labels 0.0 and 1.0.
A tuple from predictionsAndLabels should look like
(0.6753421,1.0)
and not(1.0,1.0)
Take a look at https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala
You probably still want to set numBins to control the number of points if the input is large.
First, try adding more bins like this (here numBins is 10):
If you still only have two thresholds of 0 and 1, then check to make sure the way you have defined your predictionAndLabels. You many be having this problem if you have accidentally provided
(label, prediction)
instead of(prediction, label)
.I think what happens is that all the predictions are 0.0 or 1.0. Then the intermediate threshold values make no difference.
Consider the
numBins
argument ofBinaryClassificationMetrics
:So if you don't set
numBins
, then precision will be calculated at all the different prediction values. In your case this seems to be just 0.0 and 1.0.