I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag:
For example given the
<a href="http://something">Example</a>
element I can match the href attribute using
.//a[@href='http://something']
but the given the expression
.//a[.='Example']
or even
.//a[contains(.,'Example')]
lxml throws the 'invalid node predicate' exception.
What am I doing wrong?
EDIT:
Example code:
from lxml import etree
from cStringIO import StringIO
html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html), parser)
print tree.find(".//a[text()='Example']").tag
Expected output is 'a'. I get 'SyntaxError: invalid node predicate'
I would try with:
.//a[text()='Example']
using xpath() method:
If case you would like to use iterfind(), findall(), find(), findtext(), keep in mind that advanced features like value comparison and functions are not available in ElementPath.