Building a User Based Collaborative Filtering Reco

2019-04-14 04:41发布

I have a matrix with 129539 rows and 530 columns. The first column correspond to ClientIDs and the first row to product brands. Inside I have a ranking index that each ClientID has for every product brand (0 if the ClientID never bought the product, all the way up to 10 otherwise).

I am building a User Based Collaborative Filtering Recommender System in R, using the first 5000 rows for training, and it gives me an output that doesn't make sense to me.

The code I have to generate it is the following:

# Loading to pre-computed affinity data
affinity.data <-read.csv("mydirectory")
affinity.matrix <- as(affinity.data,"realRatingMatrix")

# Creation of the model - U(ser) B(ased) C(ollaborative) F(iltering)
Rec.model <- Recommender(Rank_dataframe[1:5000,],method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=0))

# recommended top 5 items for user 1507323
recommended.items.1507323 <- predict(Rec.model, affinity.matrix["1507323",], n=5) # to display them
as(recommended.items.1507323, "list")

The output I'm getting is a list of values such as:
[[1]] [1] "0.0061652281134402" "0.00661813368630046" "0.0119331742243437" "0.0136147038801906" [5] "0.0138312586445367"

I was expecting the names of the brands that I am trying to recommend, not a list of numbers. PS: my original matrix has values from 0 to 10 (decimals included, not only integers)

Thank you very much for any help or clarification you may have.

1条回答
在下西门庆
2楼-- · 2019-04-14 05:21

There are a couple of issues here: first, the predict() function will return the predicted rating for each item for the user you chose. If you want to recommend a Top N list, you'll have to predict the rating for every item for that user, then sort the ratings and return the top N.

Second, recommender systems normally use NULL or NA or missing data when a user and item have never interacted. You've used 0 for this. That means that the predictions are going to be heavily skewed toward 0 (given that most users don't interact with most items) and that your predictions are actually saying the probability that a user will even interact with an item. This may be a feature or a bug, depending on your use case. But if your ratings 1-10 represent preferences, and 0 represents a binary used/not used, then you're mixing information and you should replace 0 with NA.

查看更多
登录 后发表回答