I want to build a recommendation application using spark mllib and the ALS algorithm in collaborative filtering technique. My data set has the user and product features in string form like :
[{"user":"StringName1", "product":"StringProduct1", "rating":1},
{"user":"StringName2", "product":"StringProduct2", "rating":2},
{"user":"StringName1", "product":"StringProduct2", "rating":3},..]
But the Rating method seems to accept only int values for both user and product features. Does that mean I will have to build a separate dictionary to map each string to an int? My dataset will have duplicate entries for both user and product.Is there a built-in solution for this in the mllib library itself?
Thanks and any help appreciated!
Edit: No, this is not a duplicate as the answer in that question doesn't seem to fit my scenario. spark.ml.recommendation.ALS.Rating
library doesn't seem to support String values for user
or item
. I need this support.
Let me try. Assuming that
data: RDD[(String, String, Float)]
That should do it. Basically, you just create a mapping from string to long and then convert long to int.