Using regex to filter attributes in xpath with php

2019-01-06 22:46发布

问题:

I am trying to filter html tables with regex matching their id attribute. What am i doing wrong? Code i am trying to implement:

        $this->xpath = new DOMXPath($this->dom); 
            $this->xpath->registerNamespace("php", "http://php.net/xpath");
            $this->xpath->registerPHPFunctions();
        foreach($xpath->query("//table[php:function('preg_match', '/post\d+/', @id)]") as $key => $row)
    {

}

Error that i get: preg_match expects second param to be a string, array given.

回答1:

An attribute is still a complex element according to DOM (has a namespace etc.). Use:

//table[php:function('preg_match', '/post\d+/', string(@id))]

Now, we need a boolean return, so:

function booleanPregMatch($match,$string){
    return preg_match($match,$string)>0;
}
$xpath->registerPHPFunctions();
foreach($xpath->query("//table[@id and php:function('booleanPregMatch', '/post\d+/', string(@id))]") as $key => $row){
     echo $row->ownerDocument->saveXML($row);
}

BTW: for more complex issues, you can of course sneakily check what's happening with this:

//table[php:function('var_dump',@id)]

It's a shame we don't have XPATH 2.0 functions available, but if you can handle this requirement with a more unreliable starts-with, I'd always prefer that over importing PHP functions.



回答2:

What am i doing wrong?

The xpath expression @id (second parameter) returns an array but preg_match expects a string.

Convert it to string first: string(@id).

Next to that you need to actually compare the output to 1 as preg_match returns 1 when found:

foreach($xpath->query("//table[@id and 1 = php:function('preg_match', '/post\d+/', string(@id))]") as $key => $row)
{
    var_dump($key, $row, $row->ownerDocument->saveXml($row));
}

Explanation/What happens here?:

A xpath expression will by default return a node-list (more precisely node-set). If you map a PHP function onto such expressions these sets are represented in form of an array. You can easily tests that by using var_dump:

$xpath->query("php:function('var_dump', //table)");

array(1) {
  [0]=>
  object(DOMElement)#3 (0) {
  }
}

Same for the xpath expression @id in the context of each table element:

$xpath->query("//table[php:function('var_dump', @id)]");

array(1) {
  [0]=>
  object(DOMAttr)#3 (0) {
  }
}

You can change that into a string typed result by making use of the xpath string function:

A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.

$xpath->query("//table[php:function('var_dump', string(@id))]");

string(4) "test"

(the table has id="test")



标签: php regex xpath