getting Forbidden by robots.txt: scrapy

2019-01-18 01:15发布

问题:

while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>

ERROR: No response downloaded for: https://www.netflix.com/

回答1:

In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py with ROBOTSTXT_OBEY

ROBOTSTXT_OBEY=False

Here are the release notes



回答2:

First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.