Checking whether a link is dead or not using Pytho

2020-03-03 04:53发布

For those who know wget, it has a option --spider, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.

As far as I understand from the urllib2.urlopen() documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?

Best,
Troels

标签： python urllib2

2条回答

地球回转人心会变

2楼-- · 2020-03-03 05:21

Not sure how to do this in python but generally you could check 'Response Header' and check 'Status-Code' for code 200. at that point you could stop reading the page and continue with your next link that way you don't have to download the whole page just the 'Response Header' List of Status Codes

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2020-03-03 05:22

You should use the HEAD Request for this, it asks the webserver for the headers without the body. See How do you send a HEAD HTTP request in Python 2?

0人赞添加讨论(0) 举报

Checking whether a link is dead or not using Pytho

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间