A little new to PHP parsing here, but I can't seem to get PHP's DomDocument to return what is clearly an identifiable node. The HTML loaded will come from the 'net so can't necessarily guarantee XML compliance, but I try the following:
<?php
header("Content-Type: text/plain");
$html = '<html><body>Hello <b id="bid">World</b>.</body></html>';
$dom = new DomDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
/*** load the html into the object ***/
$dom->loadHTML($html);
var_dump($dom);
$belement = $dom->getElementById("bid");
var_dump($belement);
?>
Though I receive no error, I only receive the following as output:
object(DOMDocument)#1 (0) {
}
NULL
Should I not be able to look up the <b>
tag as it does indeed have an id?
Well, you should check if
$dom->loadHTML($html);
returns true (success) and I would tryfor output to get a clue what might be wrong.
EDIT: http://www.php-editors.com/php_manual/function.domdocument-get-element-by-id.html - it seems that DomDocument uses XPath internally.
Example:
The Manual explains why:
By all means, go for valid HTML & provide a DTD.
Quick fixes:
$dom->validate();
and put up with the errors (or fix them), afterwards you can use$dom->getElementById()
, regardless of the errors for some reason.$x = new DOMXPath($dom); $el = $x->query("//*[@id='bid']")->item(0);
validateOnParse
to true before loading the HTML, if would also work ;P.
Outputs 'World' here.