I am an experienced Python programmer trying to transition some Python code to Spark for a classification task. This is my first time working in Spark/Scala.
In Python, both Keras/tensorflow and sci-kit Learn neural networks do a great job on the multi-class classification and I'm able to easily return the top 3 most probable classes along with probabilities which are key to this project.
I have been generally successful in moving the code to Spark (Scala) and I'm able to generate the correct predictions but I have not been able to find a way to return probabilities for the top predicted classes from the MultilayerPerceptronClassifier in MLlib.
The closest solution I found was in this post: How to get classification probabilities from MultilayerPerceptronClassifier? However, I'm not able to get the solution in the post to work either because it's missing a key piece of code or I'm too new to Scala (probably the latter) to make the needed adjustments.
Has anyone solved this problem?
These are the current versions in my environment. Spark version: 2.1.1 Scala version: 2.11.8
Thanks for your help,
RKB
If you carefully take a look at the results of
MultilayerPerceptronClassificationModel.transform
(model
andtest
as defined in the example pipeline in the official documentation)you'll see they contain
probability
column.It is stored as
o.a.s.ml.linalg.Vector
column:and can be accessed using standard methods.
This feature is available since Spark 2.3 (SPARK-12664 Expose probability, rawPrediction in MultilayerPerceptronClassificationModel).