how to handle 302 redirect in scrapy

I am receiving a 302 response from a server while scrapping a website:

2014-04-01 21:31:51+0200 [ahrefs-h] DEBUG: Redirecting (302) to <GET http://www.domain.com/Site_Abuse/DeadEnd.htm> from <GET http://domain.com/wps/showmodel.asp?Type=15&make=damc&a=664&b=51&c=0>

I want to send request to GET urls instead of being redirected. Now I found this middleware:

https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/downloadermiddleware/redirect.py#L31

I added this redirect code to my middleware.py file and I added this into settings.py:

DOWNLOADER_MIDDLEWARES = {
 'street.middlewares.RandomUserAgentMiddleware': 400,
 'street.middlewares.RedirectMiddleware': 100,
 'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
}

But I am still getting redirected. Is that all I have to do in order to get this middleware working? Do I miss something?

标签： python scrapy http-status-code-302

3条回答

手持菜刀，她持情操

2楼-- · 2019-01-19 14:23

I added this redirect code to my middleware.py file and I added this into settings.py:

DOWNLOADER_MIDDLEWARES_BASE says that RedirectMiddleware is already enabled by default, so what you did didn't matter.

I want to send request to GET urls instead of being redirected.

How? The server responds with 302 on your GET request. If you do GET on the same URL again you will be redirected again.

What are you trying to achieve?

If you want to not be redirected, see these questions:

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2019-01-19 14:32

I had an issue with infinite loop on redirections when using HTTPCACHE_ENABLED = True. I managed to avoid the problem by setting HTTPCACHE_IGNORE_HTTP_CODES = [301,302].

0人赞添加讨论(0) 举报

Fickle 薄情

4楼-- · 2019-01-19 14:41

Forgot about middlewares in this scenario, this will do the trick:

meta = {'dont_redirect': True,'handle_httpstatus_list': [302]}

That said, you will need to include meta parameter when you yield your request:

yield Request(item['link'],meta = {
                  'dont_redirect': True,
                  'handle_httpstatus_list': [302]
              }, callback=self.your_callback)

0人赞添加讨论(0) 举报

how to handle 302 redirect in scrapy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间