Use XPath with PHP's SimpleXML to find nodes c

2019-05-07 04:17发布

I try to use SimpleXML in combination with XPath to find nodes which contain a certain string.

<?php
$xhtml = <<<EOC
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />
        <title>Test</title>
    </head>
    <body>
        <p>Find me!</p>
        <p>
            <br />
            Find me!
            <br />
        </p>
    </body>
</html>
EOC;

$xml = simplexml_load_string($xhtml);
$xml->registerXPathNamespace('xhtml', 'http://www.w3.org/1999/xhtml');

$nodes = $xml->xpath("//*[contains(text(), 'Find me')]");

echo count($nodes);

Expected output: 2 Actual output: 1

When I change the xhtml of the second paragraph to

<p>
    Find me!
    <br />
 </p>

then it works like expected. How has my XPath expression has to look like to match all nodes containing 'Find me' no matter where they are?

Using PHP's DOM-XML is an option, but not desired.

Thank's in advance!

4条回答
一夜七次
2楼-- · 2019-05-07 04:39
    $doc = new DOMDocument();
    $doc->loadHTML($xhtml);

    $xPath = new DOMXpath($doc);
    $xPathQuery = "//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 'Find me')]";
    $elements = $xPath->query($xPathQuery);

    if($elements->length > 0){

    foreach($elements as $element){
        print "Found: " .$element->nodeValue."<br />";
    }}
查看更多
我只想做你的唯一
3楼-- · 2019-05-07 04:43

It depends on what you want to do. You could select all the <p/> elements that contain "Find me" in any of their descendants with

//xhtml:p[contains(., 'Find me')]

This will return duplicates and so you don't specify the kind of nodes then it will return <body/> and <html/> as well.

Or perhaps you want any node which has a child (not a descendant) text node that contains "Find me"

//*[text()[contains(., 'Find me')]]

This one will not return <html/> or <body/>.


I forgot to mention that . represents the whole text content of a node. text() is used to retrieve [a nodeset of] text nodes. The problem with your expression contains(text(), 'Find me') is that contains() only works on strings, not nodesets and therefore it converts text() to the value of the first node, which is why removing the first <br/> makes it work.

查看更多
家丑人穷心不美
4楼-- · 2019-05-07 04:43

Err, umm? But thanks @Jordy for the quick answer.

First, that's DOM-XML, which is not desired, since everything else in my script is done with SimpleXML.

Second, why do you translate to uppercase and search for an unchanged string 'Find me'? 'Searching for 'FIND ME' would actually give a result.

But you pointed me towards the right direction:

$nodes = $xml->xpath("//text()[contains(., 'Find me')]");

does the trick!

查看更多
老娘就宠你
5楼-- · 2019-05-07 04:59

I was looking for a way to find whether a node with exact value "Find Me" exists and this seemed to work.

$node = $xml->xpath("//text()[.='Find Me']");
查看更多
登录 后发表回答