DISTINCT only on one value with SPARQL

2020-07-06 06:39发布

问题:

I want to retrieve with SPARQL the list of the italian cities with more than 100k of population and I'm using the following query:

PREFIX dbo: <http://dbpedia.org/ontology/> 
SELECT ?city ?name ?pop WHERE { 
    ?city a dbo:Settlement .
    ?city foaf:name ?name .
    ?city dbo:populationTotal ?pop .
    ?city dbo:country ?country .
    ?city dbo:country dbpedia:Italy .
  FILTER (?pop > 100000) 
}

In the results I get for example in two different lines (which represent the same entity, but with different names):

http://dbpedia.org/resource/Bologna "Bologna"@en 384038

http://dbpedia.org/resource/Bologna "Comune di Bologna"@en 384038

How can I use SELECT DISTINCT only in the column ?city but still having as output the outher columns?

回答1:

You can use GROUP BY to group by a specific column and then use the SAMPLE() aggregate to select one of the values from the other columns e.g.

PREFIX dbo: <http://dbpedia.org/ontology/> 

SELECT ?city (SAMPLE(?name) AS ?cityName) (SAMPLE(?pop) AS ?cityPop)
WHERE
{ 
    ?city a dbo:Settlement .
    ?city foaf:name ?name .
    ?city dbo:populationTotal ?pop .
    ?city dbo:country ?country .
    ?city dbo:country dbpedia:Italy .
    FILTER (?pop > 100000) 
}
GROUP BY ?city

So by grouping on the ?city you get only a single row per city, since you have grouped by ?city you can't directly select variables that aren't group variables.

You must instead use the SAMPLE() aggregate to pick one of the values for each of the non-group variables you wish to have in the final results. This will select one of the values of ?name and ?pop to return as ?cityName and ?cityPop respectively