I need to use Wikipedia API Query or any other api such as Opensearch to query for a simple list of pages with some properties.
Input: a list of page (article) titles or ids.
Output: a list of pages that contain the following properties each:
page id
title
snippet/description (like in opensearch api)
page url
image url (like in opensearch api)
A result similar to this:
http://en.wikipedia.org/w/api.php?action=opensearch&search=miles%20davis&limit=20&format=xml
Only with page ids and not for a search, but rather an exact list of pages by either titles or pageids.
This should be a fairly simple thing but I have been stuck with that for quite some time trying all kinds of URL combinations from the MW api manual, without success.
I dont't think there is another way than the Open Search API to fetch Open Search data, but depending on which Wikipedia you are interested in, there might be other extensions installed to help you. Taking English Wikipedia as an example, we can make use of the MobileFrontend and PageImages extensions, that happens to be installed there.
- Title and url are available from the native MediaWiki API. To get the url, you can use
prop=info
, and specify with inprop=url
that it is the url you are interested in.
- Prominent images of a page is returned by
prop=pageimages
, thanks to PageImages.
- MobileFrontend adds a property called
extracts
, that you can use with the directive exintro
to get the first paragraph. Note however that MediWiki markup is complex, and result might not always be perfect. If we put it all together in one single query, it would be something like this:
http://en.wikipedia.org/w/api.php?action=query&pageids=21482&prop=pageimages|info|extracts&inprop=url&exintro
giving this:
<api>
<query>
<pages>
<page pageid="21482" ns="0" title="Nairobi" pageimage="Nairobi_Montage.jpg" contentmodel="wikitext" pagelanguage="en" touched="2014-02-06T06:10:01Z" lastrevid="594161616" counter="" length="89157" fullurl="http://en.wikipedia.org/wiki/Nairobi" editurl="http://en.wikipedia.org/w/index.php?title=Nairobi&action=edit">
<thumbnail source="http://upload.wikimedia.org/wikipedia/commons/thumb/6/66/Nairobi_Montage.jpg/45px-Nairobi_Montage.jpg" width="45" height="50" />
<extract xml:space="preserve">
<p><b>Nairobi</b> /naɪˈroʊbi/ is the [...]
</extract>
</page>
</pages>
</query>
</api>
Here is a multistep process to get a list of Wikipedia page titles and properties for articles, and then getting the page IDs and URLS.
Please note: It does use a portion of a previous answer: "Title and url are available from the native MediaWiki API. To get the url, you can use prop=info, and specify with inprop=url that it is the url you are interested in."
If you would like to use the Wikipedia API for your own applications and search Wikipedia for getting a list of articles about a certain topic, and you wanted the answer in JSON format, then you could could use the following URL:
https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=REPLACE_ME_WITH_SEARCH_TOPIC&format=json&callback=?
And if your eyes are having trouble parsing results from that, then replace "format=json&callback=?" with "formatversion=2" like the following example to make it easier for your eyes:
https:
//en.wikipedia.org/w/api.php?action=query&list=search&srsearch=REPLACE_ME_WITH_SEARCH_TOPIC&formatversion=2
The following example will give me a batch list of article titles and properties about/for "Thailand" in JSON format, and after that I will use the resulting titles to find the page IDs and URLS of those articles.
URL step 1:
https:
//en.wikipedia.org/w/api.php?action=query&list=search&srsearch=thailand&format=json&callback=?
From step 1, I can get the list of titles I need from inside the resulting JSON, with step 2, I use those titles gained in step 1 in another API query (aka step 2) for gaining the page IDs and URLs of those articles in the resulting JSON...results of step2.
Here are the Wikipedia article titles from the resulting JSON of step 1:
- Thailand
- Outline of Thailand
- Geography of Thailand
- Economy of Thailand
- Football in Thailand
- Southern Thailand
- Government of Thailand
- Northern Thailand
- Culture of Thailand
- Cinema of Thailand
URL step 2:
https:
//en.wikipedia.org/w/api.php?action=query&titles=Thailand|Outline%20of%20Thailand|Geography%20of%20Thailand|Economy%20of%20Thailand|Football%20in%20Thailand|Southern%20Thailand|Government%20of%20Thailand|Northern%20Thailand|Culture%20of%20Thailand|Cinema%20of%20Thailand&prop=info&inprop=url&format=json&callback=?