Retrieve first paragraph of Wikipedia article

2019-07-13 01:34发布

问题:

I've been trying to understand the MediaWiki documentation for the past 2 days and I can't figure out how to retrieve the first paragraph of a Wikipedia article through the MediaWiki API.

Could someone point me to the right direction?

I am about to appeal to file_get_contents, but I'm confident there's a "cleaner" solution.

回答1:

Don't try to use the raw API, instead use a client wrapper. Here's a long list to choose from, all for PHP:

http://en.wikipedia.org/wiki/Wikipedia:PHP_bot_framework_table



回答2:

file_get_contents is pretty clean, you get the HTML code. You can then parse the html code using DOMDocument. DOMDocument works as javascript, you can fetch all <p>'s in a div for example. Or grab the first one.

for example:

$html = file_get_contents('the url');

$dom = new DomDocument();
@$dom->loadHTML($html);

$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;