How to get Infobox from a Wikipedia article by Med

2019-01-07 14:49发布

Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes Infobox.

http://en.wikipedia.org/w/api.php?action=parse&pageid=568801&section=0&prop=wikitext

What I want is a query which will return only Infobox data. Is this possible?

4条回答
在下西门庆
2楼-- · 2019-01-07 15:16

Instead of parsing infoboxes yourself, which is quite complicated, take a look at DBPedia, which has Wikipedia infoboxes extracted out as database objects.

查看更多
▲ chillily
3楼-- · 2019-01-07 15:18

Building on @garry's answer, you can have wikipedia parse the info box into html for you via the rvparse parameter like so:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0&rvparse

Note that neither method will return just the info box. But from the html content, you can extract (via, e.g., beautifulsoup) the table with class infobox.

In Python, you do something like the following

resp = requests.get(url).json()
page_one = next(iter(resp['query']['pages'].values()))
revisions = page_one.get('revisions', [])
html = next(iter(revisions[0].values()))
# now parse the html 
查看更多
我想做一个坏孩纸
4楼-- · 2019-01-07 15:29

You can do it with a url call to the Wikipedia API like this:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=xmlfm&titles=Scary%20Monsters%20and%20Nice%20Sprites&rvsection=0

Replace the titles= section with your page title, and format=xmlfm to format=json if you want the article in json format.

查看更多
爷的心禁止访问
5楼-- · 2019-01-07 15:30

If the page has a right side infobox, then use this URL to obtain it in txt form. My example is using the element Hydrogen. All you need to do is replace "Hydrogen" with your title.

https://en.wikipedia.org/w/index.php?action=raw&title=Template:Infobox%20hydrogen

If you are looking for JSON format use this URL, but its not pretty.

https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json

查看更多
登录 后发表回答