How to get Precision/Recall using CrossValidator f

Supossed I have a Pipeline like this:

val tokenizer = new Tokenizer().setInputCol("tweet").setOutputCol("words")
val hashingTF = new HashingTF().setNumFeatures(1000).setInputCol("words").setOutputCol("features")
val idf = new IDF().setInputCol("features").setOutputCol("idffeatures")
val nb = new org.apache.spark.ml.classification.NaiveBayes()
val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, idf, nb))
val paramGrid = new ParamGridBuilder().addGrid(hashingTF.numFeatures, Array(10, 100, 1000)).addGrid(nb.smoothing, Array(0.01, 0.1, 1)).build()
val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setNumFolds(10)
val cvModel = cv.fit(df)

As you can see I defined a CrossValidator using a MultiClassClassificationEvaluator. I have seen a lot of examples getting metrics like Precision/Recall during testing process but these metris are gotten when you use a different set of data for testing purposes (See for example this documentation).

From my understanding, CrossValidator is going to create folds and one fold will be use for testing purposes, then CrossValidator will choose the best model. My question is, is possible to get Precision/Recall metrics during training process?

标签： apache-spark apache-spark-mllib apache-spark-ml apache-spark-1.5

1条回答

男人必须洒脱

2楼-- · 2019-04-13 08:08

Well, the only metric which is actually stored is the one you define when you create an instance of an Evaluator. For the BinaryClassificationEvaluator this can take one of the two values:

areaUnderROC
areaUnderPR

with the former one being default, and can be set using setMetricName method.

These values are collected during training process and can accessed using CrossValidatorModel.avgMetrics. Order of values corresponds to the order of EstimatorParamMaps (CrossValidatorModel.getEstimatorParamMaps).

0人赞添加讨论(0) 举报

How to get Precision/Recall using CrossValidator f

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间