How do I assemble pieces of HTML into a DOMDocumen

2019-08-01 04:06发布

It appears that loadHTML and loadHTMLFile for a files representing sections of an HTML document seem to fill in html and body tags for each section, as revealed when I output with the following:

$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('*');

if( !is_null($elements) ) {
    foreach( $elements as $element ) {
        echo "<br/>". $element->nodeName. ": ";

        $nodes = $element->childNodes;
        foreach( $nodes as $node ) {
            echo $node->nodeValue. "\n";
        }
    }
}

Since I plan to assemble these parts into the larger document within my own code, and I've been instructed to use DOMDocument to do it, what can I do to prevent this behavior?

2条回答
淡お忘
2楼-- · 2019-08-01 04:34

The closest you can get is to use the DOMDocumentFragment.

Then you can do:

$doc = new DOMDocument();
...
$f = $doc->createDocumentFragment();
$f->appendXML("<foo>text</foo><bar>text2</bar>"); 
$someElement->appendChild($f);

However, this expects XML, not HTML.

In any case, I think you're creating an artificial problem. Since you know the behavior is to create the html and body tags you can just extract the elements in the file from within the body tag and then import the, to the DOMDocument where you're assembling the final file. See DOMDocument::importNode.

查看更多
手持菜刀,她持情操
3楼-- · 2019-08-01 04:47

This is part of several modifications the HTML parser module of libxml makes to the document in order to work with broken HTML. It only occurs when using loadHTML and loadHTMLFile on partial markup. If you know the partial is valid X(HT)ML, use load and loadXML instead.

You could use

$doc->saveXml($doc->getElementsByTagName('body')->item(0));

to dump the outerHTML of the body element, e.g. <body>anything else</body> and strip the body element with str_replace or extract the inner html with substr.

$html = '<p>I am a fragment</p>';
$dom = new DOMDocument;
$dom->loadHTML($html); // added html and body tags
echo substr(
    $dom->saveXml(
        $dom->getElementsByTagName('body')->item(0)
    ),
    6, -7
);
// <p>I am a fragment</p>

Note that this will use XHTML compliant markup, so <br> would become <br/>. As of PHP 5.3.5, there is no way to pass a node to saveHTML(). A bug request has been filed.

查看更多
登录 后发表回答