I'm running scrapy 0.20.2.
$ scrapy shell "http://newyork.craigslist.org/ata/"
I would like to make the list of all links to advertisements pages set apart the index.html
$ sel.xpath('//a[contains(@href,html)]')
...
<Selector xpath='//a[contains(@href,"html")]' data=u'<a href="/mnh/atq/4243973984.html">Wicke'>,
<Selector xpath='//a[contains(@href,"html")]' data=u'<a href="/mnh/atd/4257230057.html" class'>,
<Selector xpath='//a[contains(@href,"html")]' data=u'<a href="/mnh/atd/4257230057.html">Recla'>,
<Selector xpath='//a[contains(@href,"html")]' data=u'<a href="/ata/index100.html" class="butt'>]
I would like to use the XPath matches function to match links the form of the regex [0-9]+.html
.
$ sel.xpath('//a[matches(@href,"[0-9]+.html")]')
...
ValueError: Invalid XPath: //a[matches(@href,"[0-9]+.html")]
What's wrong? Thank you.