In a MySQL database, I am attempting to find the most similar row across a number of numerical attributes. This problem is similar to this question but includes a flexible number of comparisons and a join table.
Database
The database consists of two tables. The first table, users, is what I'm trying to compare.
id | self_ranking
----------------------------------
1 | 9
2 | 3
3 | 2
The second table is a series of scores which the user gave to particular items.
id | user_id | item_id | score
----------------------------------
1 | 1 | 1 | 4
2 | 1 | 2 | 5
3 | 1 | 3 | 8
4 | 1 | 4 | 3
Task
I want to find the "most similar" user to a given one, valuing all the ranked items equally (along with the self score). Thus, a perfect match would be the user who has ranked all the same items in exactly the same manner & has rated himself the same, while the next most optimal choice would be one whose ranking of one item differs slightly.
I'm running into difficulty with:
- Joining the two tables in an efficient manner
- Dealing with the fact that not all users are ranking the same items. We only want to compare rankings of identical items.
Could someone help me construct a reasonable query? I'm not terribly strong with MySQL, so sorry if this answer should be obvious.
Output
If user 4 has ranked himself 8 and items 1=>4 and 2=>5, then I'd like to have the query for user 4's closest user to return 1, the user_id of the closest user.
In a slight refinement on @eggyal's method, I incorporated the number of items we are able to match against.