I want to retrieve with SPARQL the list of the italian cities with more than 100k of population and I'm using the following query:
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?city ?name ?pop WHERE {
?city a dbo:Settlement .
?city foaf:name ?name .
?city dbo:populationTotal ?pop .
?city dbo:country ?country .
?city dbo:country dbpedia:Italy .
FILTER (?pop > 100000)
}
In the results I get for example in two different lines (which represent the same entity, but with different names):
http://dbpedia.org/resource/Bologna "Bologna"@en 384038
http://dbpedia.org/resource/Bologna "Comune di Bologna"@en 384038
How can I use SELECT DISTINCT
only in the column ?city
but still having as output the outher columns?
You can use GROUP BY
to group by a specific column and then use the SAMPLE()
aggregate to select one of the values from the other columns e.g.
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?city (SAMPLE(?name) AS ?cityName) (SAMPLE(?pop) AS ?cityPop)
WHERE
{
?city a dbo:Settlement .
?city foaf:name ?name .
?city dbo:populationTotal ?pop .
?city dbo:country ?country .
?city dbo:country dbpedia:Italy .
FILTER (?pop > 100000)
}
GROUP BY ?city
So by grouping on the ?city
you get only a single row per city, since you have grouped by ?city
you can't directly select variables that aren't group variables.
You must instead use the SAMPLE()
aggregate to pick one of the values for each of the non-group variables you wish to have in the final results. This will select one of the values of ?name
and ?pop
to return as ?cityName
and ?cityPop
respectively