I need to search for Wikipedia pages that contain some specific words in their full text. To improve the results I want to limit the results to pages describing entities that are instances of a specific entity.
For searching the full text I can use the Wikipedia APIs, using the query action and the search generator.
For filtering instances of a given entity I can use the Wikidata APIs and a SPARQL query.
Is there a way to execute both operations in a single query that applies both filters?
Since June 2017, it is possible to call out to Wikimedia APIs from Wikidata SPARQL:
SELECT ?wikidata_item ?wikipedia_title {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam mwapi:generator "search" .
bd:serviceParam mwapi:gsrsearch "triplestore" .
bd:serviceParam mwapi:gsrlimit "max" .
?wikidata_item wikibase:apiOutputItem mwapi:item .
?wikipedia_title wikibase:apiOutput mwapi:title .
}
# hint:Prior hint:runFirst "true".
?wikidata_item wdt:P31 wd:Q3539533 .
FILTER (?wikipedia_title != "Blazegraph")
}
Try it!
No, those have completely separate search backends that do not interact. The Wikidata API uses SQL queries; the search API uses Elasticsearch; the SPARQL service uses Blazegraph.