Reading Child Nodes with XMLReader

2019-09-05 08:58发布

I'm trying to write an XMLReader/SimpleXML hybrid function to read a very large (700MB) XML file. The XML is in this format:

<Items>
    <Item>
         <ItemKey>ABCDEF123</ItemKey>
         <Name>
             <English>An Item Name</English>
             <German>An Item Name In German</German>
             <French>An Item Name In French</French>
         </Name>
         <Description>
             <English>An Item Description</English>
             <German>An Item Description In German</German>
             <French>An Item Description In French</French>
         </Description>
    </Item>
    <Item>
         <ItemKey>GHIJKL456</ItemKey>
         <Name>
             <English>Another Item Name</English>
             <German>Another Item Name In German</German>
             <French>Another Item Name In French</French>
         </Name>
         <Description>
             <English>Another Item Description</English>
             <German>Another Item Description In German</German>
             <French>Another Item Description In French</French>
         </Description>
    </Item>
</Items>

The code I have written so far to do this:

$xml = new XMLReader();
if(!$xml->open('testitems.xml')){
    die('Failed to open file!');
} else {
    echo 'File opened';
}

$items = array();

while ($xml->read()){
    if($xml->nodeType == XMLReader::ELEMENT){
        if ($xml->name == 'Item'){
            $item = array();
        }

        if ($xml->name == 'ItemKey'){
            $xml->read();
            $item['itemKey'] = $xml->value;
        }
        if ($xml->name == 'Name'){
            $sxml = new SimpleXMLElement($xml->readOuterXml());
            $englishName = $sxml->English;
            $item['englishName'] = $englishName;
        }
    }
    if($xml->nodeType == XMLReader::END_ELEMENT){
        if ($xml->name == 'Item'){
            $items[] = $item;
        }
    }
}
var_dump($items);
$xml->close();

However, while the ItemKey node value is being inserted into the array, the English Name is not, I can't seem to access this node properly. I would just use XMLReader for everything but since there are repeat occurences of the English node (one for Name, another for Description) from my Googling so far SimpleXML seemed the way forward, but no joy as yet.

Any suggestions? Any good guides? XMLReader documentation on php.net is woefully lacking in comparison to many other PHP features, and in general it seems hard to find detailed guides that are clear and concise.

2条回答
来,给爷笑一个
2楼-- · 2019-09-05 09:15

Nevermind, figured it out. For anyone else who gets stuck on this:

$xml = new XMLReader();
if(!$xml->open('Items.xml')){
    die('Failed to open file!');
} else {
    echo 'File opened';
}

$items = array();

while ($xml->read() && $xml->name !== "Item");
while ($xml->name === "Item") {
    $item = array();
    $node = new SimpleXMLElement($xml->readOuterXML());
    $item['itemkey'] = $node->ItemKey;
    $item['englishName'] = $node->Name->English;
    $item['englishDesc'] = $node->Description->English;
    $items[] = $item;
}
查看更多
爷、活的狠高调
3楼-- · 2019-09-05 09:30

Well if you still can build that array, your XML file is probably not that large :). Try to load the whole file with simplexml for example, you might be surprised that it does not consume that much memory.

Anyway, if you still want to use XMLReader, I often suggest my XMLReader Iterator library that is able to iterate over an XMLReader to access elements, children and do stuff like turning fragments into SimpleXMLElements.

The following is an example which is nearly identical to your example above:

require('xmlreader-iterators.php'); // https://github.com/hakre/XMLReaderIterator/tree/master/build/include

$xmlFile = "xmlreader-17262798.xml";

$reader = new XMLReader();
$reader->open($xmlFile);

/* @var $itemIterator XMLReaderNode[] */
$itemIterator = new XMLElementIterator($reader, 'Item');

$items = array();

foreach ($itemIterator as $item) {
    $xml     = $item->asSimpleXML();
    $items[] = array(
        'itemKey'     => (string)$xml->ItemKey,
        'englishName' => (string)$xml->Name->English,
    );
}

When you run it on your demo data, the resulting $items array is:

Array
(
    [0] => Array
        (
            [itemKey] => ABCDEF123
            [englishName] => An Item Name
        )

    [1] => Array
        (
            [itemKey] => GHIJKL456
            [englishName] => Another Item Name
        )

)

Technically you don't need to use that library, it only operates on an XMLReader so it doesn't change how XMLReader works. It's an add-on.

Why it doesn't work in your specific case is hard to say, your code did run flawlessly on my computer:

Array
(
    [0] => Array
        (
            [itemKey] => ABCDEF123
            [englishName] => SimpleXMLElement Object
                (
                    [0] => An Item Name
                )

        )

    [1] => Array
        (
            [itemKey] => GHIJKL456
            [englishName] => SimpleXMLElement Object
                (
                    [0] => Another Item Name
                )

        )

)

As this print_r output of $items (your code) shows, the englishName keys are set to the simplexml elements. You might want to cast those to string as I did in my example (these two (string) parts) to have strings there instead of SimpleXMLElements, that probably was your issue. If not, check your libxml version:

var_dump(LIBXML_DOTTED_VERSION); # string(5) "2.7.8"

And report it back (that is the library XMLReader is based on). Also debug your SimpleXMLElement (var_dump($sxml->asXML());) so you can check the expected XML has been loaded.

The library I suggest btw. comes also with a single include file if you want to try it fast.

Last time I suggested that library was in:


Edit: An additional, hybrid version w/o the library showing the use of next() which is useful as you iterate always over the same-named siblings: <Item>:

$xmlFile = "xmlreader-17262798.xml";

$reader = new XMLReader();
$reader->open($xmlFile);

$reader->read() && $reader->read(); // init and position onto first element

$items = array();
while ($reader->next('Item')) {
    $node = new SimpleXMLElement($reader->readOuterXML());

    $items[] = array(
        'itemkey'     => $node->ItemKey,
        'englishName' => $node->Name->English,
        'englishDesc' => $node->Description->English,
    );
}
查看更多
登录 后发表回答