Understanding the Pearson Correlation Coefficient

2019-05-23 12:12发布

As part of the calculations to generate a Pearson Correlation Coefficient, the following computation is performed:

enter image description here

In the second formula: p_a,i is the predicted rating user a would give item i, n is the number of similar users being compared to, and ru,i is the rating of item i by user u.

What value will be used if user u has not rated this item? Did I misunderstand anything here?

2条回答
祖国的老花朵
2楼-- · 2019-05-23 12:23

According to the link, earlier calculations in step 1 of the algorithm are over a set of items, indexed 1 to m, whe m is the total number of items in common.

Step 3 of the algorithm specifies: "To find a rating prediction for a particular user for a particular item, first select a number of users with the highest, weighted similarity scores with respect to the current user that have rated on the item in question."

These calculations are performed only on the intersection of different users set of rated items. There will be no calculations performed when a user has not rated an item.

查看更多
淡お忘
3楼-- · 2019-05-23 12:36

It only makes sense to calculate results if both users have rated a movie. Linear regression can be visualised as a method of finding a straight line through a two-dimensional graph where one variable is plotted on the X axis and another one - on Y axis. Each combination of ratings is represented as a point on an euclidean plane [u1_rating, u2_rating]. Since you can not plot points which only have one dimension to them, you'll have to discard those cases.

查看更多
登录 后发表回答