There is a dataset of users ranking movies. Need to find the users with similar taste to user1. Similar taste defined as follows: consider the average rank for genre
from user1
as avgr1
and for the same genre from user2
as avgr2
, then user1
and user2
have similar taste is abs(avgr1-avgr2)<1
. So far I was able to get the names, the genre and the absolute value between averages, but the filtering for comparison is not working.
SELECT ?p ?p1 ?genre (abs (AVG(?rating)-AVG(?ratingp1)) AS ?RDiff)
WHERE{
?p movies:hasRated ?rate.
?p1 foaf:knows ?p.
?rate movies:ratedMovie ?mov.
?rate movies:hasRating ?rating.
?mov movies:hasGenre ?genre.
?p1 movies:hasRated ?ratep1.
?ratep1 movies:ratedMovie ?movp1.
?ratep1 movies:hasRating ?ratingp1.
?movp1 movies:hasGenre ?genre.
FILTER (xsd:float(?Rdiff)<1.0 && ?p=movies:user1)
}
GROUP BY ?p ?p1 ?genre
It's very hard to answer these kinds of questions without some sample data to work with. Here's some sample data that has two users who have similar rankings on comedy, but different rankings on romance:
@prefix : <urn:ex:>
:a :ranks [ :genre :comedy ; :value 2 ],
[ :genre :comedy ; :value 3 ],
[ :genre :comedy ; :value 3 ],
[ :genre :romance ; :value 7 ],
[ :genre :romance ; :value 8 ],
[ :genre :romance ; :value 9 ].
:b :ranks [ :genre :comedy ; :value 3 ],
[ :genre :comedy ; :value 3 ],
[ :genre :comedy ; :value 4 ],
[ :genre :romance ; :value 0 ],
[ :genre :romance ; :value 1 ],
[ :genre :romance ; :value 0 ].
Here's a query that computes the difference of their average rankings on genres:
prefix : <urn:ex:>
select ?user1 ?user2 ?genre (abs(avg(?value1)-avg(?value2)) as ?diff) {
?user1 :ranks [ :genre ?genre ; :value ?value1 ].
?user2 :ranks [ :genre ?genre ; :value ?value2 ].
filter (str(?user1) < str(?user2)) #-- avoid duplicate user1/user2, user2/user1 results
}
group by ?user1 ?user2 ?genre
order by ?diff
---------------------------------------------------------
| user1 | user2 | genre | diff |
=========================================================
| :a | :b | :comedy | 0.666666666666666666666667 |
| :a | :b | :romance | 7.666666666666666666666667 |
---------------------------------------------------------
Now, you can't filter on aggregate results, you have to use having, so to only take values where diff is less than some particular value, you'd do this:
prefix : <urn:ex:>
select ?user1 ?user2 ?genre (abs(avg(?value1)-avg(?value2)) as ?diff) {
?user1 :ranks [ :genre ?genre ; :value ?value1 ].
?user2 :ranks [ :genre ?genre ; :value ?value2 ].
filter (str(?user1) < str(?user2))
}
group by ?user1 ?user2 ?genre
having (?diff < 1)
order by ?diff
--------------------------------------------------------
| user1 | user2 | genre | diff |
========================================================
| :a | :b | :comedy | 0.666666666666666666666667 |
--------------------------------------------------------
If you don't care about the actual diff, except that it's below the threshold, you can put the expression in the having directly, and do:
select ?user1 ?user2 ?genre {
#-- ...
}
group by ?user1 ?user2 ?genre
having (abs(avg(?value1)-avg(?value2)) < 1)
?Rdiff is not defined in the FILTER - the happens before the AS in the SELECT.
Try using HAVING which goes after the GROUP BY.