Avoiding redirection

I'm trying to parse a site(written in ASP) and the crawler gets redirected to the main site. But what I'd like to do is to parse the given url, not the redirected one. Is there a way to do this?. I tried adding "REDIRECT=False" to the settings.py file without success.

Here's some output from the crawler:

2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=500&id=500>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1513&id=1513>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=476&id=476>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=472&id=472>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=457&id=457>
2011-09-24 20:01:11-0300 [coto] DEBUG: Redirecting (302) to <GET http://www.cotodigital.com.ar/default.asp> from <GET http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097>

标签： python scrapy

2条回答

别忘想泡老子

2楼-- · 2019-02-26 01:02

http://www.cotodigital.com.ar/l.asp?cat=1097&id=1097 redirects to http://www.cotodigital.com.ar/default.asp because HTTP response said to so. This happens because asp code is checking for some condition - a wrong page, or cookies, or user-agent, or referrer. Check the mentioned conditions.

UPDATE: Just checked in my browser: the browser is also redirected to the main page, where i click 'Skip ads'. After that it works OK.

This means it sets some cookies, without which it redirects to the main page.

0人赞添加讨论(0) 举报

欢心

3楼-- · 2019-02-26 01:03

The original URL has nothing to scrape. It returned 302, meaning there is no body, and the Location header indicates where to redirect to. You need to figure out how to access the URL without being redirected, perhaps by authenticating.

0人赞添加讨论(0) 举报

Avoiding redirection

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间