I am receiving a 302 response from a server while scrapping a website:
2014-04-01 21:31:51+0200 [ahrefs-h] DEBUG: Redirecting (302) to <GET http://www.domain.com/Site_Abuse/DeadEnd.htm> from <GET http://domain.com/wps/showmodel.asp?Type=15&make=damc&a=664&b=51&c=0>
I want to send request to GET urls instead of being redirected. Now I found this middleware:
https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/downloadermiddleware/redirect.py#L31
I added this redirect code to my middleware.py file and I added this into settings.py:
DOWNLOADER_MIDDLEWARES = {
'street.middlewares.RandomUserAgentMiddleware': 400,
'street.middlewares.RedirectMiddleware': 100,
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
}
But I am still getting redirected. Is that all I have to do in order to get this middleware working? Do I miss something?
Forgot about middlewares in this scenario, this will do the trick:
meta = {'dont_redirect': True,'handle_httpstatus_list': [302]}
That said, you will need to include meta parameter when you yield your request:
yield Request(item['link'],meta = {
'dont_redirect': True,
'handle_httpstatus_list': [302]
}, callback=self.your_callback)
I added this redirect code to my middleware.py file and I added this into settings.py:
DOWNLOADER_MIDDLEWARES_BASE
says that RedirectMiddleware
is already enabled by default, so what you did didn't matter.
I want to send request to GET urls instead of being redirected.
How? The server responds with 302
on your GET
request. If you do GET
on the same URL again you will be redirected again.
What are you trying to achieve?
If you want to not be redirected, see these questions:
- Avoiding redirection
- Facebook url returning an mobile version url response in scrapy
- How to avoid redirection of the webcrawler to the mobile edition?
I had an issue with infinite loop on redirections when using HTTPCACHE_ENABLED = True
. I managed to avoid the problem by setting HTTPCACHE_IGNORE_HTTP_CODES = [301,302]
.