Overriding urllib2.HTTPError or urllib.error.HTTPE

2019-01-16 07:35发布

I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML.

With Python 2.6, I normally fetch a page using:

import urllib2
url = "http://google.com"
data = urllib2.urlopen(url)
data = data.read()

When attempting to use this on the failing URL, I get the exception urllib2.HTTPError:

urllib2.HTTPError: HTTP Error 500: Internal Server Error

How can I fetch such error pages (with or without urllib2), all while they are returning Internal Server Errors?

Note that with Python 3, the corresponding exception is urllib.error.HTTPError.

3条回答
我欲成王,谁敢阻挡
2楼-- · 2019-01-16 08:18

The HTTPError is a file-like object. You can catch it and then read its contents.

try:
    resp = urllib2.urlopen(url)
    contents = resp.read()
except urllib2.HTTPError, error:
    contents = error.read()
查看更多
唯我独甜
3楼-- · 2019-01-16 08:27

If you mean you want to read the body of the 500:

request = urllib2.Request(url, data, headers)
try:
        resp = urllib2.urlopen(request)
        print resp.read()
except urllib2.HTTPError, error:
        print "ERROR: ", error.read()

In your case, you don't need to build up the request. Just do

try:
        resp = urllib2.urlopen(url)
        print resp.read()
except urllib2.HTTPError, error:
        print "ERROR: ", error.read()

so, you don't override urllib2.HTTPError, you just handle the exception.

查看更多
smile是对你的礼貌
4楼-- · 2019-01-16 08:40
alist=['http://someurl.com']

def testUrl():
    errList=[]
    for URL in alist:
        try:
            urllib2.urlopen(URL)
        except urllib2.URLError, err:
            (err.reason != 200)
            errList.append(URL+" "+str(err.reason))
            return URL+" "+str(err.reason)
    return "".join(errList)

testUrl()
查看更多
登录 后发表回答