I'm importing some arbitrary HTML into a DOMDocument
using the loadHTML()
function, eg.:
$html = '<p><a href="test.php">Test</a></p>';
$doc = new DOMDocument;
$doc->loadHTML($html);
I then want to change a few attributes/node values using DOMDocument
methods which I can do no problem.
Once I've made these changes I'd like to export the HTML string (using ->saveHTML()
), without the <html><body>...
tags that the DOMDocument
automatically adds to the HTML.
I understand why these are added (to ensure a valid document), but how would I go about just getting my edited HTML back (essentially everything between the <body>
tags)?
I have read this post and while it offers some solutions I would rather do this 'properly', i.e. without using a string replace on the <body>
tags. Validity of the HTML is not an issue as it's run through an HTML purifier before hand.
Any ideas? Thanks.
EDIT
I'm aware of the $node
parameter added to saveHTML()
in PHP 5.3.6, unfortunately I'm stuck with 5.2.
Thanks but I won't necessarily know the type of the first tag in the body, it needs to be generic
Perhaps the source code of this will help - They're using a regex to strip out the unnecessary strings:
http://beerpla.net/projects/smartdomdocument-a-smarter-php-domdocument-class/
saveHTMLExact() - DOMDocument has an extremely badly designed "feature" where if the HTML code you are loading does not contain
<html>
and<body>
tags, it adds them automatically (yup, there are no flags to turn this behavior off).Thus, when you call $doc->saveHTML(), your newly saved content now has
<html><body>
andDOCTYPE
in it. Not very handy when trying to work with code fragments (XML has a similar problem).SmartDOMDocument contains a new function called saveHTMLExact() which does exactly what you would want – it saves HTML without adding that extra garbage that DOMDocument does.
Also, other questions have asked similar things:
How to saveHTML of DOMDocument without HTML wrapper?
Try using DOMDocument->saveXML()?
It outputs
<p><a href="test.php">Test</a></p>