Wikipedia API and SPARQL in a single query

2019-06-27 06:21发布

问题:

I need to search for Wikipedia pages that contain some specific words in their full text. To improve the results I want to limit the results to pages describing entities that are instances of a specific entity.

For searching the full text I can use the Wikipedia APIs, using the query action and the search generator.

For filtering instances of a given entity I can use the Wikidata APIs and a SPARQL query.

Is there a way to execute both operations in a single query that applies both filters?

回答1:

Since June 2017, it is possible to call out to Wikimedia APIs from Wikidata SPARQL:

SELECT ?wikidata_item ?wikipedia_title {
    SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam mwapi:generator "search" .
      bd:serviceParam mwapi:gsrsearch "triplestore" .
      bd:serviceParam mwapi:gsrlimit "max" .
      ?wikidata_item wikibase:apiOutputItem mwapi:item . 
      ?wikipedia_title wikibase:apiOutput mwapi:title .
 }
  # hint:Prior hint:runFirst "true".
  ?wikidata_item wdt:P31 wd:Q3539533  .
  FILTER (?wikipedia_title != "Blazegraph")
}

Try it!



回答2:

No, those have completely separate search backends that do not interact. The Wikidata API uses SQL queries; the search API uses Elasticsearch; the SPARQL service uses Blazegraph.