-->

Get an article summary from the MediaWiki API

2019-04-17 12:15发布

问题:

I am looking for a mediawiki api using which I can get short description about any query string. For example , if I search for Nicolas Cage then it should return the short description for him.

I tried http://en.wikipedia.org/w/api.php?%20format=json&action=query&titles=Nicolas%20Cage&prop=revisions&rvprop=content

I am not sure if prop=revisions is right. My intention is to get a short description on the final version of the page.

Also I need another api which can give the link of the wikipedia page (web / mobile) from the query string. i.e. For Nicolas Cage, http://en.wikipedia.org/wiki/Nicolas_cage should be returned.

回答1:

  1. There is no such thing as a page summary in MediaWiki by default,but you can get the first paragraph of a page like this: http://en.wikipedia.org/w/api.php?action=parse&page=Nicolas_Cage&prop=text&section=0
    If the wiki has the extension PageSummaries installed, you can use that to get exactly what you are asking for (like in this example from the extension description page).

  2. To find pages matching a string, you use the open search function, like this: http://en.wikipedia.org/w/api.php?action=opensearch&search=Nicolas%20cage&namespace=0

edit: @Bergi point out in the comments that open search also gives a summary of the page. I had somehow missed that.



回答2:

Say, you want to get the summary of a search string Nicolas Cage.

Step 1. Get the page id:
"https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Nicolas%20Cage&format=json&srlimit=1"
Step 2. Use this page id to get section 0 of the page: "https://en.wikipedia.org/w/api.php?action=parse&section=0&pageid=21111&prop=text&format=json"
Step 3. Parse as per requirements.
Step 3 extended for Python: Use BeautifulSoup for target tags and get_text() gives plaintext.
use rvprop to get latest revision, further go through mediaWIKI documentation.

Alternate Solution:
Step 1. Get page title using step 1 above.
Step 2. Use the title as follows: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Nicolas%20Cage