scrapy unable to make Request() callback

2019-07-03 23:08发布

I am trying to make recursive parsing script with Scrapy, but Request() function doesn't call callback function suppose_to_parse(), nor any function provided in callback value. I tried different variations but none of them work. Where to dig ?

from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector



class joomler(BaseSpider):
    name = "scrapy"
    allowed_domains = ["scrapy.org"]
    start_urls = ["http://blog.scrapy.org/"]


    def parse(self, response):
        print "Working... "+response.url
        hxs = HtmlXPathSelector(response)
        for link in hxs.select('//a/@href').extract():
            if not link.startswith('http://') and not link.startswith('#'):
               url=""
               url=(self.start_urls[0]+link).replace('//','/')
               print url
               yield Request(url, callback=self.suppose_to_parse)


    def suppose_to_parse(self, response):
        print "asdasd"
        print response.url

标签： python scrapy

2条回答

来，给爷笑一个

2楼-- · 2019-07-03 23:41

Move the yield outside of the if statement:

for link in hxs.select('//a/@href').extract():
    url = link
    if not link.startswith('http://') and not link.startswith('#'):
        url = (self.start_urls[0] + link).replace('//','/')

    print url
    yield Request(url, callback=self.suppose_to_parse)

0人赞添加讨论(0) 举报

等我变得足够好

3楼-- · 2019-07-03 23:41

I'm not an expert but i tried your code and i think that the problem is not on the Request, the generated url's seems to be broken, if you add some url's to a list and iterate thorough them and yield the Request with the callback, it works fine.

0人赞添加讨论(0) 举报

scrapy unable to make Request() callback

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间