How do you parse and process HTML/XML in PHP?

2018-12-31 00:06发布

How can one parse HTML/XML and extract information from it?

29条回答
唯独是你
2楼-- · 2018-12-31 00:35

JSON and array from XML in three lines:

$xml = simplexml_load_string($xml_string);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

Ta da!

查看更多
与君花间醉酒
3楼-- · 2018-12-31 00:37

For 1a and 2: I would vote for the new Symfony Componet class DOMCrawler ( DomCrawler ). This class allows queries similar to CSS Selectors. Take a look at this presentation for real-world examples: news-of-the-symfony2-world.

The component is designed to work standalone and can be used without Symfony.

The only drawback is that it will only work with PHP 5.3 or newer.

查看更多
皆成旧梦
4楼-- · 2018-12-31 00:37

This is commonly referred to as screen scraping, by the way. The library I have used for this is Simple HTML Dom Parser.

查看更多
浅入江南
5楼-- · 2018-12-31 00:37

You could try using something like HTML Tidy to cleanup any "broken" HTML and convert the HTML to XHTML, which you can then parse with a XML parser.

查看更多
永恒的永恒
6楼-- · 2018-12-31 00:40

I recommend PHP Simple HTML DOM Parser.

It really has nice features, like:

foreach($html->find('img') as $element)
       echo $element->src . '<br>';
查看更多
余生无你
7楼-- · 2018-12-31 00:40

Advanced Html Dom is a simple HTML DOM replacement that offers the same interface, but it's DOM-based which means none of the associated memory issues occur.

It also has full CSS support, including jQuery extensions.

查看更多
登录 后发表回答