How get first level of dom elements by Domdocument

2020-02-21 08:51发布

问题:

How get first level of dom elements by Domdocument PHP?

Example with code that not works - tooken from Q&A:http://stackoverflow.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument

<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
    var_dump($entry->firstChild->nodeValue);
}
?>

Thanks, Yosef

回答1:

The first level of elements below the root node can be accessed with

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

See DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

and DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

Since childNodes is a property of DOMNode any class extending DOMNode (which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement is to access that DOMElement's childNode property.


Note that if you use DOMDocument::loadHTML() on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body> DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body> element to get the div elements from your example code, e.g.

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace to false or query for the right element nodeType if you only want to get DOMElement nodes, e.g.

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}

or use XPath

$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

Additional information:

  • DOMDocument in php
  • Printing content of a XML file using XML DOM