Using DOMDocument to extract from HTML document by

2020-02-12 09:22发布

问题:

In the DOMDocument class there are methods to get elements by by id and by tag name (getElementById & getElementsByTagName) but not by class. Is there a way to do this?

As an example, how would I select the div from the following markup?

<html>
...
<body>
...
<div class="foo">
...
</div>
...
</body>
</html>

回答1:

The simple answer is to use xpath:

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="foo"]')->item(0);

But that won't accept spaces. So to select by space separated class, use this query:

//*[contains(concat(' ', normalize-space(@class), ' '), ' class ')


回答2:

$html = '<html><body><div class="foo">Test</div><div class="foo">ABC</div><div class="foo">Exit</div><div class="bar"></div></body></html>';

$dom = new DOMDocument();
@$dom->loadHtml($html);

$xpath = new DOMXPath($dom);

$allClass = $xpath->query("//@class");
$allClassBar = $xpath->query("//*[@class='bar']");

echo "There are " . $allClass->length . " with a class attribute<br>";

echo "There are " . $allClassBar->length . " with a class attribute of 'bar'<br>";


回答3:

In addition to ircmaxell's answer if you need to select by space separated class:

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$classname='foo';
$div = $xpath->query("//table[contains(@class, '$classname')]")->item(0);