可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In a MySQL database, I am attempting to find the most similar row across a number of numerical attributes. This problem is similar to this question but includes a flexible number of comparisons and a join table.

Database

The database consists of two tables. The first table, users, is what I'm trying to compare.

id | self_ranking
----------------------------------
1       | 9
2       | 3
3       | 2

The second table is a series of scores which the user gave to particular items.

id | user_id | item_id | score
----------------------------------
1  | 1       | 1       | 4
2  | 1       | 2       | 5
3  | 1       | 3       | 8
4  | 1       | 4       | 3

Task

I want to find the "most similar" user to a given one, valuing all the ranked items equally (along with the self score). Thus, a perfect match would be the user who has ranked all the same items in exactly the same manner & has rated himself the same, while the next most optimal choice would be one whose ranking of one item differs slightly.

I'm running into difficulty with:

Joining the two tables in an efficient manner
Dealing with the fact that not all users are ranking the same items. We only want to compare rankings of identical items.

Could someone help me construct a reasonable query? I'm not terribly strong with MySQL, so sorry if this answer should be obvious.

Output

If user 4 has ranked himself 8 and items 1=>4 and 2=>5, then I'd like to have the query for user 4's closest user to return 1, the user_id of the closest user.

回答1:

SELECT   u2.user_id

-- join our user to their scores
FROM     (users u1 JOIN scores s1 USING (user_id))

-- and then join other users and their scores
    JOIN (users u2 JOIN scores s2 USING (user_id))
      ON s1.item_id  = s2.item_id
     AND u1.user_id != u2.user_id

-- filter for our user of interest
WHERE    u1.user_id = ?

-- group other users' scores together
GROUP BY u2.user_id

-- and here's the magic: order in descending order of "distance" between
-- our selected user and all of the others: you may wish to weight
-- self_ranking differently to item scores, in which case just multiply
-- appropriately
ORDER BY SUM(ABS(s2.score - s1.score))
       + ABS(u2.self_ranking - u1.self_ranking) DESC

回答2:

In a slight refinement on @eggyal's method, I incorporated the number of items we are able to match against.

SELECT   u2.user_id

-- join our user to their scores
FROM     (users u1 JOIN scores s1 USING (user_id))

-- and then join other users and their scores
    JOIN (users u2 JOIN scores s2 USING (user_id))
      ON s1.item_id  = s2.item_id
     AND u1.user_id != u2.user_id

-- filter for our user of interest
WHERE    u1.user_id = ?

-- group other users' scores together
GROUP BY u2.user_id

-- subtract the degree of difference in correlating scores from the number of correlating scores
ORDER BY (SUM(s1.item_id = s2.item_id) - 
  ( SUM(ABS(s2.score - s1.score) + ABS(u2.self - u1.self) ) ) ) DESC