I have a problem with scrapy. In a request fails (eg 404,500), how to ask for another alternative request? Such as two links can obtain price info, the one failed, request another automatically.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Use "errback" in the Request like
errback=self.error_handler
where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.
see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html
回答2:
Just set handle_httpstatus_list = [404, 500]
and check for the status code in parse
method. Here's an example:
from scrapy.http import Request
from scrapy.spider import BaseSpider
class MySpider(BaseSpider):
handle_httpstatus_list = [404, 500]
name = "my_crawler"
start_urls = ["http://github.com/illegal_username"]
def parse(self, response):
if response.status in self.handle_httpstatus_list:
return Request(url="https://github.com/kennethreitz/", callback=self.after_404)
def after_404(self, response):
print response.url
# parse the page and extract items
Also see:
- How to get the scrapy failure URLs?
- Scrapy and response status code: how to check against it?
- How to retry for 404 link not found in scrapy?
Hope that helps.