-->

CData in simplexml opened from XMLReader

2019-05-10 19:29发布

问题:

I've got a bunch of XML file which I'm loading in to my script using XMLReader, creating DOM object and then converting to Simplexml.

Problem is one of the XML file uses CDATA which SIMPLEXML ignores and normally using SIMPLEXML_LOAD_FILE I'd add the LIBXML_NOCDATA parameter but as I'm using simplexml_import_dom I can't figure out how to ignore the CDATA in the sceanrio below.

Any ideas please?

Many thanks Brett

$file = 'test.xml';
$reader = new XMLReader();
$reader->open($file);       
while ($reader->read())
{
    // are we in a product?
    if ($reader->nodeType == XMLReader::ELEMENT &&
        strtolower($reader->localName) == 'product')

    {
        if (!$node = $reader->expand()) {
            //do nothing 
        }
        else {
             // expand the node into a DOMNode
        // Convert to SimpleXML via DOM, messy but SimpleXML is soo much nicer.
        $dom  = new DomDocument();
        $dom->appendChild($dom->importNode($node, true));
        $products = simplexml_import_dom($dom);

        // do whatever we want to do with the product data

}

回答1:

You could try something like:

<?php
$str = $dom->saveXML();
$product = simplexml_load_string($str, 'SimpleXMLElement', LIBXML_NOCDATA | LIBXML_NOBLANKS);


回答2:

There seems to be a lot of confusion and misinformation about SimpleXML's handling of CDATA nodes. It does not "ignore" CDATA, it simply remembers that a particular node was in CDATA by representing it as an object not a plain string.

If you always follow the good practice of casting SimpleXML's return values explicitly to string, you should see the contents of the CDATA just fine.

For more, see http://php.net/function.simplexml-load-string.php#84365

Alternatively, the LIBXML_NOCDATA parameter you mention can be passed to simplexml_load_string. If you really need the XMLReader for some other reason, you could presumably use $reader->readOuterXML() instead of converting via a DOMDocument.