I had to rewrite part of a programme to use XMLReader to select parts of an XML file for processing.
Take this simplified XML as an example:
<odds>
<sport>
<region>
<group>
<event name="English Championship 2014-15" eventid="781016.1">
<bet name="Kazanan" betid="12377108.1">
<selection selectionid="52411062.1"/>
</selection>
</bet>
</event>
</group>
</region>
</sport>
</odds>
This call to xpath()
:
$bets = $xml->xpath(
"//odds/sport/region/group/event/bet/selection[contains(@selectionid,'".$selectionToFind."')]/.."
);
would select the whole <bet>
node and its children (<selection>
nodes).
My code, however, would select only one <selection>
node with a given selectionid
:
$reader = new XMLReader;
$reader->open('file.xml');
while($reader->read()) {
$event = $reader->getAttribute($value);
if ($event == 781016.1 ) {
$node = new SimpleXMLElement($reader->readOuterXML());
var_dump($node);
break;
}
}
How can replicate the behaviour of xpath()
with XMLReader
so that I select the <bet>
node and its children and not only one <selection>
child?
I guess the question boils down to: Can I select the whole parent node <bet>
by the attribute value of a child, e.g. <selection selectionid="[some_value]">
?
[Ignore the SimpleXML solution and look down at the XMLReader one]
I would suggest using the SimpleXMLElement::xpath method.
http://php.net/manual/en/simplexmlelement.xpath.php
$xml = new SimpleXMLElement($xml_string);
/* Search for <a><b><c> */
$result = $xml->xpath("/odds/sport/region/group/event/bet");
$result will contain all children of 'bet' note.
// XMLReader solution **********************
$reader = new XMLReader;
$reader->open('file.xml');
$parent_element = null;
while($reader->read()) {
$selectionid = $reader->getAttribute('selectionid');
if ($selectionid == '52411062.1' ) {
// use the parent of the node with attribute 'selectionid' = '52411062.1'
$node = $parent_element;
var_dump($node);
break;
}
elseif ($reader->name === 'bet') { )
{
// store parent element
$parent_element = new SimpleXMLElement($reader->readOuterXML());
}
}
DOMXPath
is said to be more robust than SimpleXML
with respect to performance (it has other advantages, e.g. it can properly deal with namespaces). See for example this IBM article for a discussion of several XPath libraries in PHP.
I'm just curious if your performance issue would persist (or still be as severe) when using DOMXPath
:
<?php
$doc = new DOMDocument;
$doc->load('sample.xml');
$xpath = new DOMXPath($doc);
$nodes = $xpath->query("/odds/sport/region/group/event/bet[selection/@selectionid = '52411062.1']");
foreach ($nodes as $node)
{
print $xml = $node->ownerDocument->saveXML($node);
}
?>
The result, taking as input the small snippet you have shown, is
<bet name="Kazanan" betid="12377108.1">
<selection selectionid="52411062.1"/>
</bet>
If that does not help, you really have to resort to an event-based (pull-style) XML parser, that does not read the whole document into memory - as Yasen suggests.
XMLReader can expand()
the current node into a DOMNode
. This will load only the node and its descendants into memory.
After that, you can use a DOMXPath
instance or convert the node into a SimpleXMLElement
.
$reader = new XMLReader();
$reader->open('data:/text/xml,'.urlencode($xml));
$dom = new DOMDocument();
$xpath = new DOMXpath($dom);
while($reader->read()) {
if (
$reader->nodeType == XMLReader::ELEMENT &&
$reader->localName == 'bet'
) {
$bet= $reader->expand($dom);
if ($xpath->evaluate('count(selection[@selectionid = "52411062.1"]) > 0', $bet)) {
var_dump($dom->saveXml($bet));
}
}
}
You will always have to decide which part to implement in XMLReader and which in DOM/SimpleXML. In XMLReader you will have to validate the nodes and maintain a state, but can avoid to load the data. At one point in the parsing the XML snippets will be small enough and you can use expand()
.