Wikipedia articles may have Infobox templates. By the following call I can get the first section of an article which includes Infobox.
http://en.wikipedia.org/w/api.php?action=parse&pageid=568801§ion=0&prop=wikitext
What I want is a query which will return only Infobox data. Is this possible?
Instead of parsing infoboxes yourself, which is quite complicated, take a look at DBPedia, which has Wikipedia infoboxes extracted out as database objects.
Building on @garry's answer, you can have wikipedia parse the info box into html for you via the
rvparse
parameter like so:Note that neither method will return just the info box. But from the html content, you can extract (via, e.g., beautifulsoup) the
table
with classinfobox
.In
Python
, you do something like the followingYou can do it with a url call to the Wikipedia API like this:
Replace the
titles=
section with your page title, andformat=xmlfm
toformat=json
if you want the article in json format.If the page has a right side infobox, then use this URL to obtain it in txt form. My example is using the element Hydrogen. All you need to do is replace "Hydrogen" with your title.
https://en.wikipedia.org/w/index.php?action=raw&title=Template:Infobox%20hydrogen
If you are looking for JSON format use this URL, but its not pretty.
https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json