如何捕捉在urllib.urlretrieve 404错误(How to catch 404 err

背景：我使用urllib.urlretrieve ，如在反对任何其他功能urllib*模块，因为钩功能支持（见reporthook下文）..其被用于显示文本的进度条。这是的Python> = 2.6。

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])

然而， urlretrieve是如此的愚蠢，它叶无道检测HTTP请求的状态（例如：它是404或200？）。

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar')
>>> h.items() 
[('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),
 ('expires', '-1'),
 ('content-type', 'text/html; charset=ISO-8859-1'),
 ('server', 'gws'),
 ('cache-control', 'private, max-age=0')]
>>> h.status
''
>>>

什么是最有名的下载方式与钩状支持远程HTTP文件（以显示进度条）和体面的HTTP错误处理？

Answer 1:

退房urllib.urlretrieve的完整代码：

def urlretrieve(url, filename=None, reporthook=None, data=None):
  global _urlopener
  if not _urlopener:
    _urlopener = FancyURLopener()
  return _urlopener.retrieve(url, filename, reporthook, data)

换句话说，你可以使用urllib.FancyURLopener （公众的urllib API的它的一部分）。您可以覆盖http_error_default检测404：

class MyURLopener(urllib.FancyURLopener):
  def http_error_default(self, url, fp, errcode, errmsg, headers):
    # handle errors the way you'd like to

fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)

Answer 2:

您应该使用：

import urllib2

try:
    resp = urllib2.urlopen("http://www.google.com/this-gives-a-404/")
except urllib2.URLError, e:
    if not hasattr(e, "code"):
        raise
    resp = e

print "Gave", resp.code, resp.msg
print "=" * 80
print resp.read(80)

编辑：这里的理由是，除非你期望的异常状态，这是它发生的异常，你可能甚至没有去想它-这样，而不是让你的代码继续运行，而这是不成功，默认行为是 - 相当理智 - 以抑制其执行。