It appears that loadHTML
and loadHTMLFile
for a files representing sections of an HTML document seem to fill in html
and body
tags for each section, as revealed when I output with the following:
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$elements = $doc->getElementsByTagName('*');
if( !is_null($elements) ) {
foreach( $elements as $element ) {
echo "<br/>". $element->nodeName. ": ";
$nodes = $element->childNodes;
foreach( $nodes as $node ) {
echo $node->nodeValue. "\n";
}
}
}
Since I plan to assemble these parts into the larger document within my own code, and I've been instructed to use DOMDocument to do it, what can I do to prevent this behavior?
The closest you can get is to use the
DOMDocumentFragment
.Then you can do:
However, this expects XML, not HTML.
In any case, I think you're creating an artificial problem. Since you know the behavior is to create the
html
andbody
tags you can just extract the elements in the file from within the body tag and then import the, to the DOMDocument where you're assembling the final file. SeeDOMDocument::importNode
.This is part of several modifications the HTML parser module of libxml makes to the document in order to work with broken HTML. It only occurs when using
loadHTML
andloadHTMLFile
on partial markup. If you know the partial is valid X(HT)ML, useload
andloadXML
instead.You could use
to dump the outerHTML of the body element, e.g.
<body>anything else</body>
and strip the body element withstr_replace
or extract the inner html withsubstr
.Note that this will use XHTML compliant markup, so
<br>
would become<br/>
. As of PHP 5.3.5, there is no way to pass a node tosaveHTML()
. A bug request has been filed.