Scrapy:How to print request referrer

2020-02-26 04:51发布

Is it possible to get the request referrer from the response object in parse function?

10x

标签: python scrapy
2条回答
▲ chillily
2楼-- · 2020-02-26 05:34

The question above was asked a long time ago, and it has been answered well.

However, I thought I would add a different answer in case the answer by Rostyslav Dzinko does not apply/work in your case.

Let's say that you have 2 different parser methods:

  1. one parser (Let's call it parser_A) simply parses the list of articles (list page) to extract link info and others.
  2. Another parser (Let's call it parser_B) extracts article info from the target article (article page).

If you cannot get the url (referer url) for the list of articles (list page) once you are in the parser_B, you can set headers field in parser_A, then send it to parser_B as the following example:

yield scrapy.Request(url=article_page_url, callback=self.parser_B, dont_filter=True, headers={'referer_url': list_page_url})

And, in parser_B method, you can do the following to obtain the list page's url:

print(response.request.headers.get('referer_url'))

Hope this helps those who needed help.

查看更多
Bombasti
3楼-- · 2020-02-26 05:50

HTTP Referer field is set up by HTTP client in request headers, not in response headers, as this header tells server where did client come from to current page.

It would be rather weird to receive http Referer header in response.

But when talking about scrapy, there's a reference to Request object on which the Response was generated, in response's request field, so the next call result:

response.request.headers.get('Referer', None)

can contain Referer header if it was set when making request.

查看更多
登录 后发表回答