Find main category for article using Wikipedia API

2019-04-11 08:38发布

问题:

I have a list of articles and I want to find the main category of each article.

Wikipedia lists its main categories here - http://en.wikipedia.org/wiki/Portal:Contents/Categories.

I am able to find the subcategories of each article using:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=%s&format=xml

I also am able to check whether a subcategory is within a category:

http://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=categories&clcategories=Domesticated animals&format=xml

This will tell me whether "domesticated animals" is a subcategory of Dog, but this is not quite what I want. I want to be able to check which main category 'domesticated animals' is in. Is this possible using the API?

回答1:

First, there is no such thing as a "Wikipedia API". There is a MediaWiki (web) API. Knowing this will help you find information on the existing tools. https://www.mediawiki.org/wiki/API:Main_Page

Which tells you there is no API which will do all the category recursion for you. Why? Because 1) it's extremely inefficient, 2) the recursion might go anywhere or never end.

However there is a solution now, by Magnus Manske: https://tools.wmflabs.org/catscan2/reverse_tree.php?doit=1&language=en&project=wikipedia&title=Dog&namespace=0 "Maximum depth: 61 levels Total categories along the way : 7988" Using that definition, the "root" category for [[Dog]], i.e. the farthest father category, is "Industry by country". Probably not what you expected! However, from the English Wikipedia's perspective the root category for any article is always the same, [[Category:Contents]].