I'm trying to get all text nodes of /td/span.
I'm trying with xpath /td/span/text()
The problem is it's returning ALL the text nodes for every text element (there are two here, "193" and "120", it returns "193120" twice, instead of 193 and 120 in separate elements).
I try the exact same xpath on any online tool, it works fine, in php, completely different results.
using SimpleXMLElement
$xhtmlSnippet = '<td><span>193<span>10</span><span></span><div>66</div><span>195</span><span>.</span><span>34</span><span>242</span><span></span>120<span>64</span></span></td>';
$xml = new SimpleXMLElement($xhtmlSnippet);
$xresult = $xml->xpath('/td/span/text()');
foreach($xresult as $xnode){
echo "<br /><br />NodeValue: " . $xnode;
}
Gives me:
NodeValue: 193120
NodeValue: 193120
Here is an example of it working properly via an online tool (ALL of the other online tools give the expected output also):
Working example in online tester
EDIT:
Using DOMDocument + DOMXPath, it seems to work as expected:
$dom = new DOMDocument;
$dom->loadXML($xhtmlSnippet);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/td/span/text()) as $textNode) {
echo "\n\nTextNode: " . $textNode->nodeValue;
}
Gives:
TextNode: 193
TextNode: 120
A SimpleXMLElement can only represent elements and attributes, either individually or a collection of siblings of the same type. The
->xpath()
method returns an array of SimpleXMLElement objects, which allows them to be non-siblings, but does not allow for any other node type.Consequently, the expression
/td/span/text()
matches the two text nodes, but returns them as objects representing their parent element, which in this case happens to be the same<span>
element, giving you an array with the same object in twice.The remaining part of the puzzle is that when you cast a SimpleXML element to string it combines all its direct descendant text and CDATA nodes into one string, so the
193
and120
get stuck together.Thus the output is
193120
, twice.(This is definitely unintuitive behaviour, although it's hard to know quite what SimpleXML should do in this situation; perhaps it would be better to produce an error if the XPath expression resolves to something other than elements or attributes).
Since the DOM API has objects for every kind of node that can possibly exist in XML, and PHP includes a full implementation of that API, the XPath expression will work as expected there. What's more, the SimpleXML and DOM objects are actually both wrappers around the same internal memory structures, so you can write operations combining the two using
dom_import_simplexml()
andsimplexml_import_dom()
.As a slightly inelegant example, if you wanted to run an XPath expression in the context of an element you'd already traversed to with SimpleXML, you could do something like this:
Obviously, you could wrap this up into a function as desired. Also note that since your expression starts at the document root (leading
/
) the actual context is irrelevant, which is why I've used a slightly different expression above.