I'm wondering if it's possible (using the built in features of SparkR or any other workaround), to extract the class probabilities of some of the classification algorithms that included in SparkR. Particular ones of interest are.
spark.gbt()
spark.mlp()
spark.randomForest()
Currently, when I use the predict function on these models I am able to extract the predictions, but not the actual probabilities or "confidence."
I've seen several other questions that are similar to this topic, but none that are specific to SparkR, and many have not been answered in regards to Spark's most recent updates.
i ran into the same problem, and following this answer now use
SparkR:::callJMethod
to transform the probabilityDenseVector
(which R cannot deserialize) to anArray
(which R reads as aList
). It's not very elegant or fast, but it does the job:e.g.: start your spark session
generate toy data
train a random forest and run predictions:
collect your predictions:
now extract the probabilities:
ofcourse, the function wrapper around
SparkR:::callJMethod
is a bit of an overkill. You can also use it directly, e.g. with dplyr: