Python lxml.html XPath “attribute not equal” opera

2019-02-26 04:36发布

问题:

I'm trying to run the following script:

#!python

from urllib import urlopen #urllib.request for python3
from lxml import html

url =   'http://mpk.lodz.pl/rozklady/1_11_D2D3/00d2/00d2t001.htm?r=KOZINY'+\
        '%20-%20Srebrzy%F1ska,%20Cmentarna,%20Legion%F3w,%20pl.%20Wolno%B6ci'+\
        ',%20Pomorska,%20Kili%F1skiego,%20Przybyszewskiego%20-%20LODOWA'

raw_html = urlopen(url).read()
tree = html.fromstring(raw_html) #need to .decode('windows-1250') in python3
ret = tree.xpath('//td [@class!="naglczas"]')
print ret
assert(len(ret)==1)

I expect it to select the one td that doesn't have its class set to 'naglczas'. Instead, it returns me an empty list. Why is that? I guess there's some silly reason, but I tried googling and found nothing that would explain it.

回答1:

Your xpath expression will find

a td element that has a class which is not "naglczas"

You seem to want(since the only 3 td-s with a class have the same class you don't want)

a td element which does not have a class of "naglczas"

Those might sound similar, but they are different. Something like

tree.xpath('//td[not(@class="naglczas")]')

should get you what you want.

Also, you don't need to use urllib to open the url, lxml can do that for you, using lxml.html.parse().

Python lxml.html XPath “attribute not equal” opera

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮