Scrapy restrict_xpath syntax error

I'm trying to limit Scrapy to a particular XPath location for following links. The XPath is correct (according to XPath Helper plugin for chrome), but when I run my Crawl Spider I get a syntax error at my Rule.

My Spider code is:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from tutorial.items import BassItem

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()


class BassSpider(CrawlSpider):
    name = "bass"
    allowed_domains = ["talkbass.com"]
    start_urls = ["http://www.talkbass.com/forum/f126"]


    rules = [Rule(SgmlLinkExtractor(allow=['/f126/index*']), callback='parse_item', follow=True, restrict_xpaths=('//a[starts-with(@title,"Next ")]')]


    def parse_item(self, response):

        hxs = HtmlXPathSelector(response)


        ads = hxs.select('//table[@id="threadslist"]/tbody/tr/td[@class="alt1"][2]/div')
        items = []
        for ad in ads:
            item = BassItem()
            item['title'] = ad.select('a/text()').extract()
            item['link'] = ad.select('a/@href').extract()
            items.append(item)
        return items

So inside the rule, the XPath '//a[starts-with(@title,"Next ")]' is returning an error and I'm not sure why, since the actual XPath is valid. I'm simply trying to get the spider to crawl each "Next Page" link. Can anyone help me out. Please let me know if you need any other parts of my code for help.

标签： xpath scrapy

1条回答

啃猪蹄的小仙女

2楼-- · 2019-07-24 20:50

It's not the xpath that is the issue, rather that the syntax of the complete rule is incorrect. The following rule fixes the syntax error, but should be checked to make sure that it is doing what is required:

rules = (Rule(SgmlLinkExtractor(allow=['/f126/index*'], restrict_xpaths=('//a[starts-with(@title,"Next ")]')), 
        callback='parse_item', follow=True, ),
)

As a general point, posting the actual error in a question is highly recommended since the perception of the error and the actual error may well differ.

0人赞添加讨论(0) 举报

Scrapy restrict_xpath syntax error

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间