i am trying to use the following xpath query in python
from lxml.html.soupparser import fromstring
root = fromstring(inString)
nodes = root.xpath(".//p3[matches(.,'ABC')]//preceding::p2//p3")
but it gives me the error
nodes = root.xpath(".//p3[matches(.,'ABC')]//preceding::p2//p3")
File "lxml.etree.pyx", line 1507, in lxml.etree._Element.xpath (src\lxml\lxml.etree.c:52198)
File "xpath.pxi", line 307, in lxml.etree.XPathElementEvaluator.__call__ (src\lxml\lxml.etree.c:152124)
File "xpath.pxi", line 227, in lxml.etree._XPathEvaluatorBase._handle_result (src\lxml\lxml.etree.c:151097)
File "xpath.pxi", line 212, in lxml.etree._XPathEvaluatorBase._raise_eval_error (src\lxml\lxml.etree.c:150896)
lxml.etree.XPathEvalError: Unregistered function
how can i use XPath 2.0 functions here with lxml?
Clarification
I was using the contains function earlier as
nodes = root.xpath(".//p3[contains(text(),'ABC')]//preceding::p2//p3")
problem is that my xml has newlines and whitespaces in the text, hence i tried using something like
nodes = root.xpath(".//p3[contains(normalize-space(),'ABC')]//preceding::p2//p3")
but this has no effect. Finally i tried to use the matches function and i got the error.
Sample XML
<doc>
<q></q>
<p1>
<p2 dd="ert" ji="pp">
<p3>1</p3>
<p3>2</p3>
<p3>
ABC
</p3>
<p3>3</p3>
</p2>
<p2 dd="ert" ji="pp">
<p3>4</p3>
<p3>5</p3>
<p3>ABC</p3>
<p3>6</p3>
</p2>
</p1>
<r></r>
<p1>
<p2 dd="ert" ji="pp">
<p3>7</p3>
<p3>8</p3>
<p3>ABC
</p3>
<p3>9</p3>
</p2>
<p2 dd="ert" ji="pp">
<p3>10</p3>
<p3>11</p3>
<p3>ABC</p3>
<p3>12</p3>
</p2>
</p1>
</doc>
You cannot (reference):
contains()
is probably the closest you can go in this case:As mentioned in the other answer, stressing on the other part of the quoted documentation, you can use EXSLT extensions to have a regex
match()
function with lxml, for example :