Scrapy crawl http header data only

2019-04-01 10:58发布

(How) can I archieve that scrapy only downloads the header data of a website (for check purposes etc.)

I've tried to disable some download-middlewares but it doesn't seem to work.

1条回答
Luminary・发光体
2楼-- · 2019-04-01 11:11

Like @alexce said, you can issue HEAD Requests instead of the default GET:

Request(url, method="HEAD")

UPDATE: If you want to use HEAD requests for your start_urls you will need to override the make_requests_from_url method:

def make_requests_from_url(self, url):
    return Request(url, method='HEAD', dont_filter=True)
查看更多
登录 后发表回答