Using PHP to illustrate: there are a BUG in the normalizeDocument()
method, or a lack of a "refresh" method, because DOM consistence is lost after changes (even only attribute changes)... So, any algorithm "with DOM changes" that you implement with LIBXML2 somethimes works and sometimes not, is unpredictable!! (?)
The "refresh" by $doc->LoadXML($doc->saveXML());
is a workaround and lost performance in a flow of work with DOM... A sub-question: all moment I need to refresh DOM?
$XML = '
<html>
<h1>Hello</h1>
<ol>
<li>test (no id)</li>
<li xml:id="i2">test i2</li>
</ol>
</html>
';
$doc = new DOMDocument;
$doc->LoadXML($XML);
doSomeChange($doc); // here DOM is modified
print $doc->saveXML(); // show new DOM state
$doc->normalizeDocument(); // NOT REFRESHING!?!
var_dump($doc->getElementById('i2')); //NULL!??! is a BUG!
//CAN_NOT_doMORESomeChange($doc);
$doc->LoadXML($doc->saveXML()); // only way to refresh?
print $doc->getElementById('i2')->tagName; //OK, is there
// illustrating attribute modification:
function doSomeChange(&$dom) {
$max = 0;
$xp = new DOMXpath($dom);
foreach(iterator_to_array($xp->query('/html/* | //li')) as $e) {
$max++;
$e->setAttribute('xml:id',"i$max");
}
print "\ncmpDOM='".($xp->document === $dom)."'\n"; // after @ThomasWeinert
}
So, input is the $XML and output is
<html>
<h1 xml:id="i1">Hello</h1>
<ol xml:id="i2">
<li xml:id="i3">test (no id)</li>
<li xml:id="i4">test i2</li>
</ol>
</html>
NULL
ol
the NULL is the bug (see code comments).
PS: if I change input line <li xml:id="i2">test i2</li>
to <li>test i2</li>
the algorithm works as expected (!), so, is unpredictable.
Related questions: In DomDocument, reuse of DOMXpath, it is stable? PHP DomDocument, reuse of XSLTProcessor, it is stable/secure?