How to get only the most recent value from a Wikid

2020-02-02 01:34发布

问题:

Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):

SELECT
?country 
?countryLabel 
?hdi_value
?hdi_date
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country p:P1081 ?hdi_statement. 
         ?hdi_statement ps:P1081 ?hdi_value.
         ?hdi_statement pq:P585 ?hdi_date.
       }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Link to Query Console

I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.

Any help or link to a relevant example query is appreciated!

P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country

SELECT
?country 
?countryLabel 
?population
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1082 ?population. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

while the same query but for HDI returns multiple results per country:

SELECT
?country 
?countryLabel 
?hdi
WHERE {
 ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1081 ?hdi. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.

回答1:

Both your questions are duplicates, but I'll try to add interesting facts to existing answers.

Question 1 is a duplicate of SPARQL query to get only results with the most recent date.

This technique does the trick:

FILTER NOT EXISTS {
    ?country p:P1081/pq:P585 ?hdi_date_ .
    FILTER (?hdi_date_ > ?hdi_date)
}

However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).


Question 2 is a duplicate of Some cities aren't instances of city or big city?

You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.

Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.

The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.