Scrapy crawl http header data only

2019-04-01 10:58发布

(How) can I archieve that scrapy only downloads the header data of a website (for check purposes etc.)

I've tried to disable some download-middlewares but it doesn't seem to work.

标签： python http-headers scrapy

1条回答

2楼-- · 2019-04-01 11:11

Like @alexce said, you can issue HEAD Requests instead of the default GET:

Request(url, method="HEAD")

UPDATE: If you want to use HEAD requests for your start_urls you will need to override the make_requests_from_url method:

def make_requests_from_url(self, url):
    return Request(url, method='HEAD', dont_filter=True)

0人赞添加讨论(0) 举报