PHP HTML DomDocument getElementById problems

A little new to PHP parsing here, but I can't seem to get PHP's DomDocument to return what is clearly an identifiable node. The HTML loaded will come from the 'net so can't necessarily guarantee XML compliance, but I try the following:

<?php
header("Content-Type: text/plain");

$html = '<html><body>Hello <b id="bid">World</b>.</body></html>';

$dom = new DomDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;

/*** load the html into the object ***/
$dom->loadHTML($html);
var_dump($dom);    

$belement = $dom->getElementById("bid");
var_dump($belement);

?>

Though I receive no error, I only receive the following as output:

object(DOMDocument)#1 (0) {
}
NULL

Should I not be able to look up the <b> tag as it does indeed have an id?

标签： php html parsing

2条回答

贪生不怕死

2楼-- · 2019-01-04 14:13

Well, you should check if $dom->loadHTML($html); returns true (success) and I would try

 var_dump($belement->nodeValue);

for output to get a clue what might be wrong.

EDIT: http://www.php-editors.com/php_manual/function.domdocument-get-element-by-id.html - it seems that DomDocument uses XPath internally.

Example:

$xpath = xpath_new_context($dom);
var_dump(xpath_eval_expression($xpath, "//*[@ID = 'YOURIDGOESHERE']"));

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2019-01-04 14:25

The Manual explains why:

For this function to work, you will need either to set some ID attributes with DOMElement->setIdAttribute() or a DTD which defines an attribute to be of type ID. In the later case, you will need to validate your document with DOMDocument->validate() or DOMDocument->validateOnParse before using this function.

By all means, go for valid HTML & provide a DTD.

Quick fixes:

Call $dom->validate(); and put up with the errors (or fix them), afterwards you can use $dom->getElementById(), regardless of the errors for some reason.
Use XPath if you don't feel like validing: $x = new DOMXPath($dom); $el = $x->query("//*[@id='bid']")->item(0);
Come to think of it: if you just set validateOnParse to true before loading the HTML, if would also work ;P

$dom = new DOMDocument();
$html ='<html>
<body>Hello <b id="bid">World</b>.</body>
</html>';
$dom->validateOnParse = true; //<!-- this first
$dom->loadHTML($html);        //'cause 'load' == 'parse

$dom->preserveWhiteSpace = false;

$belement = $dom->getElementById("bid");
echo $belement->nodeValue;

Outputs 'World' here.

0人赞添加讨论(0) 举报

PHP HTML DomDocument getElementById problems

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间