SPARQL apply MAX on a value of aggregation [duplic

2019-07-08 07:56发布

This question already has an answer here:

I have this query:

prefix : <http://test.example/> 

select  ?anotherUser  (COUNT(?anotherItem) as ?countOfSharedLikedItems) 
WHERE 
{
   values ?user {:ania}.
   ?anotherUser :likes ?anotherItem.

   filter (?anotherUser != ?user)
   filter exists {?user :likes ?anotherItem}
}
group by ?anotherUser
order by desc(?countOfSharedLikedItems)

That runs on this data:

@prefix : <http://test.example/> .

:alice :likes :beethoven.
:alice :likes :verdi.
:sofia :likes :beethoven.
:sofia :likes :verdi.
:sofia :likes :rossini.
:sofia :likes :mozart.
:ania :likes :verdi.
:ania :likes :beethoven.
:ania :likes :david.
:david :likes :ania.
:david :likes :beethoven.
:david :likes :verdi.
:antonino :likes :verdi.
:antonino :likes :mozart.
:weirdo :likes :katyperry.
:beethoven a :recommendable.
:verdi a :recommendable.
:rossini a :recommendable.
:katyperry a :recommendable.
:mozart a :recommendable.

It is working fine, I'm making aggregation and bound the value to a variable called countOfSharedLikedItems. Now I want to have the maximum value of that parameter, I tried this:

prefix : <http://test.example/> 

select  
   ?anotherUser 
   (MAX(?countOfSharedLikedItems) as ?max) 
   (COUNT(?anotherItem) as ?countOfSharedLikedItems)  
WHERE 
{
   values ?user { :ania }.
   ?anotherUser :likes ?anotherItem.

   filter (?anotherUser != ?user)
   filter exists { ?user :likes ?anotherItem }
}
group by ?anotherUser
order by desc(?countOfSharedLikedItems)

But the result for the variable max is always empty.

What wrong did I do?

Hint I do need to have the three variables in the output which are anotherUser, countOfSharedLikedItems, and max. Otherwise, I would have just make an outer select and I could get the max, but I need the three of them, that's why I am asking because i didn't could do it myself

Update 1

I tried this SILLY solution, it works, but it is obviously so silly

prefix : <http://test.example/> 
select ?anotherUser ?countOfSharedLikedItems ?maxSharedLikedItems 
WHERE {
{
   select  
      ?anotherUser  
      (COUNT(?anotherItem) as ?countOfSharedLikedItems)  
   WHERE 
   {     
      values ?user { :ania }.
      ?anotherUser :likes ?anotherItem.

      filter (?anotherUser != ?user)
      filter exists { ?user :likes ?anotherItem }
   }
   group by ?anotherUser
   order by desc(?countOfSharedLikedItems)
}
{
   select (MAX(?countOfSharedLikedItems) as ?maxSharedLikedItems) 
   WHERE 
   {
       select  
          ?anotherUser  
          (COUNT(?anotherItem) as ?countOfSharedLikedItems)  
       WHERE 
       {
          values ?user { :ania }.
          ?anotherUser :likes ?anotherItem.

          filter (?anotherUser != ?user)
          filter exists { ?user :likes ?anotherItem }
       }
       group by ?anotherUser
   }
}
}

could you check please and suggest another better solution?

1条回答
劫难
2楼-- · 2019-07-08 08:30

The structure of your outermost query has a fatal issue in these parts:

select  
   (MAX(?countOfSharedLikedItems) as ?max) 
   (COUNT(?anotherItem) as ?countOfSharedLikedItems)  
WHERE { ... }
group by ?anotherUser

You need to fully understand what group by does. The triples in the where part provide a bunch of result rows. E.g., in a query like

select * where {
  ?s ?p ?o 
}

the result is a bunch of rows, each of which has a value for each of the variables. When you add a group by clause, you're saying that you want to partition those rows up into a bunch of sets. E.g., if the results had been

s1 p1 o1
s1 p2 o2
s2 p1 o2
s2 p1 o3
s3 p1 o2

and you grouped by ?s, then you partition those results into something like this:

s1
   [ p1 o1 ]
   [ p2 o2 ]

s2 
   [ p1 o2 ]
   [ p1 o3 ]

s3 
   [ p1 o2 ]

For each unique s value, you still have a bunch of rows, each of which provides a p and an o value. The aggregation functions operate over those bunches. So when you do something like

select ?s (max(?o) as ?oMax) where {
  ?s ?p ?o
}
group by ?s

the max aggregate gets to work on each of the bunches, i.e., on [o1, o2], [o2, o3], and [o2], and produces a single value from each one. So something like

select  
   (count(?o) as ?numO)
   (max(?numO) as ?oMax) 
group by ?s

doesn't make sense. When you group by ?s, the count operates within each bunch, and there are some values, so you can count them. But max also tries to operate within each bunch, but the bunches don't have any binding of ?numO to look at. And even if you could use the value from count(?o), there'd only be one per bunch, so max would just be returning that value.

查看更多
登录 后发表回答