Duplicate rows when making SPARQL queries

2019-07-07 15:06发布

问题:

I would like to extract speeches regarding a specific agenda item from the European Parliament, which are accessible through a SPARQL interface here: http://linkedpolitics.ops.few.vu.nl/user/query

The schema of the database is found here: http://linkedpolitics.ops.few.vu.nl/home

Through the following query

SELECT ?speaker ?given ?surname ?acronym ?text ?partyLabel ?type
WHERE {
   <http://purl.org/linkedpolitics/eu/plenary/2010-12-16_AgendaItem_4> dcterms:hasPart ?speech.
   ?speech lpv:speaker ?speaker.
   ?speaker foaf:givenName ?given.
   ?speaker foaf:familyName ?surname.
   ?speaker lpv:countryOfRepresentation ?country.
   ?country lpv:acronym ?acronym.
   ?speech lpv:translatedText ?text.
   ?speaker lpv:politicalFunction ?func.
   ?func lpv:institution ?institution.
   ?institution rdfs:label ?partyLabel.
   ?institution rdf:type ?type.
   FILTER(langMatches(lang(?text), "en"))
}

I get the information that I want, but all the rows are duplicated several times. This happens when I try to access the party label through the political function it seems. How do I get unique rows only and what is the reason for duplicates appearing in the first place?

回答1:

You're using a large number of variables, and you're not selecting all of them. That means that the difference in the rows that you're getting back are probably in the variables that you're not actually selecting. E.g., if you had data:

:a :hasChild :b .
:a :hasChild :c .

and you ran the query:

select ?parent where {
  ?parent :hasChild ?child .
}

you'd get two rows in the result:

?parent
-------
:a
:a

because there are two bindings that provide solutions: one where ?child is :a, and one where child is ?b.

To avoid this, you can use select distinct, which removes the "duplicate" result rows. Just do:

SELECT DISTINCT ?speaker ?given ?surname ?acronym ?text ?partyLabel ?type



标签: rdf sparql