Understanding the Pearson Correlation Coefficient

2019-05-23 12:07发布

问题:

As part of the calculations to generate a Pearson Correlation Coefficient, the following computation is performed:

In the second formula: p_a,i is the predicted rating user a would give item i, n is the number of similar users being compared to, and ru,i is the rating of item i by user u.

What value will be used if user u has not rated this item? Did I misunderstand anything here?

回答1:

According to the link, earlier calculations in step 1 of the algorithm are over a set of items, indexed 1 to m, whe m is the total number of items in common.

Step 3 of the algorithm specifies: "To find a rating prediction for a particular user for a particular item, first select a number of users with the highest, weighted similarity scores with respect to the current user that have rated on the item in question."

These calculations are performed only on the intersection of different users set of rated items. There will be no calculations performed when a user has not rated an item.



回答2:

It only makes sense to calculate results if both users have rated a movie. Linear regression can be visualised as a method of finding a straight line through a two-dimensional graph where one variable is plotted on the X axis and another one - on Y axis. Each combination of ratings is represented as a point on an euclidean plane [u1_rating, u2_rating]. Since you can not plot points which only have one dimension to them, you'll have to discard those cases.