I'm developing PHP applications for quite a while now. But this one realy gets me struggled. I’m loading complete HTML pages using the DomDocument. These pages are external and may contain JavaScript. This is beyond my control.
On some pages things were not rendered the way it supposed to when it came down to basic HTML formatting in JavaScript strings. I've wrote down an example which explains it all.
<?php
$html = new DOMDocument();
libxml_use_internal_errors(true);
$strPage = '<html>
<head>
<title>Demo</title>
<script type="text/javascript">
var strJS = "<b>This is bold.</b><br /><br />This should not be bold. Where did my closing tag go to?";
</script>
</head>
<body>
<script type="text/javascript">
document.write(strJS);
</script>
</body>
</html>';
$html->loadHTML($strPage);
echo $html->saveHTML();
exit;
?>
Am I missing something?
Edit: I've changed the demo. Changing the LoadHTML to LoadXML doesn't work anymore now and the output of the demo will pass w3c validation. Also adding the CDATA block to the JavaScript doesn't seem to have any effect.
I dont know why (tried to find out), but it works if you load the HTML using
loadXML
instead ofloadHTML
Though the HTML is actually invalid, everything is in the head.