I write a spark program for making recommendations. Then I used ALS.recommendation library. And I made a small test with the following dataset called trainData:
(u1, m1, 1)
(u1, m4, 1)
(u2, m2, 1)
(u2, m3, 1)
(u3, m1, 1)
(u3, m3, 1)
(u3, m4, 1)
(u4, m3, 1)
(u4, m4, 1)
(u5, m2, 1)
(u5, m4, 1)
The first column contains the user, the second contains the items rated by the users and the third contains the ratings.
In my code written in scala I trained the model using:
myModel = ALS.trainImplicit(trainData, 3, 5, 0.01, 1.0)
I try to retrieve some recommendations for u1 using this instruction:
recommendations = myModel.recommendProducts(idUser, 2)
where idUser contains the ID affected to the user u1 As recommendations, I obtain:
(u1, m1, 1.0536233346170754)
(u1, m4, 0.8540954252858661)
(u1, m3, 0.09069877419040584)
(u1, m2, -0.1345521479521654)
As you can see, the first two lines show that the items recommended are the ones that u1 had already rated (m1 and m4). Whatever the user I select to obtain the recommendations, I always get the same behavior (the first items recommended are the ones the user already rated).
I find it weird! Is there any problem anywhere?
I think that is the expected behaviour of using
recommendProducts
, when you are training a matrix factorization algorithm such as ALS you are attempting to find a rating that relates each user to each item.ALS does this based on the items the user has already rated, so when you are finding recommendations for a given user the model will be most sure about the ratings it has already seen, so it will most of the times recommend products already rated.
What you need to do is to keep a list of products each user as rated and filter them when making the recommendations.
EDIT:
I dug a bit into the source code and the documentations to be sure of what I was saying.
ALS.recommendProducts
is implemented in the class MatrixFactorizationModel (source code). You can see there that the model when making recommendations doesn't care if the user has already rated that item.And you should note that if you are using implicit ratings then you most definetly want to recommend products already implicitly rated by the user: Imagine the case where your implicit ratings are page views of your product in an online store and what you want is that the user buys the product.
I don't have access to that book Advanced analytics with Spark so I can't comment on the explations and examples there.
Docs:
ALS
MatrixFactorizationModel