Handling exceptions from urllib2 and mechanize in

I am a novice at using exception handling. I am using the mechanize module to scrape several websites. My program fails frequently because the connection is slow and because the requests timeout. I would like to be able to retry the website (on a timeout, for instance) up to 5 times after 30 second delays between each try.

I looked at this stackoverflow answer and can see how I can handle various exceptions. I also see (although it looks very clumsy) how I can put the try/exception inside a while loop to control the 5 attempts ... but I do not understand how to break out of the loop, or "continue" when the connection is successful and no exception has been thrown.

from mechanize import Browser
import time

b = Browser()
tried=0
while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
    if isinstance(e,mechanize.HTTPError):
      print e.code
      tried += 1
      sleep(30)
      if tried > 4:
        exit()
    else:
      print e.reason.args
      tried += 1
      sleep(30)
      if tried > 4:
        exit()

print "How can I get to here after the first successful b.open() attempt????"

I would appreciate advice about (1) how to break out of the loop on a successful open, and (2) how to make the whole block less clumsy/more elegant.

标签： python exception-handling mechanize-python

3条回答

狗以群分

2楼-- · 2019-04-02 10:14

For your first question, you simply want the "break" keyword, which breaks out of a loop.

For the second question, you can have several "except" clauses for one "try", for different kinds of exceptions. This replaces your isinstance() check and will make your code cleaner.

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

3楼-- · 2019-04-02 10:20

Your first question can be done with break:

while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
    break
  except #etc...

The real question, however, is do you really want to: this is what is known as "Spaghetti code": if you try to graph execution through the program, it looks like a plate of spaghetti.

The real (imho) problem you are having, is that your logic for exiting the while loop is flawed. Rather than trying to stop after a number of attempts (a condition that never occurs because you're already exiting anyway), loop until you've got a connection:

#imports etc

tried=0
connected = False
while not Connected:
    try:
        r = b.open('http://www.google.com/foobar')
        connected = true # if line above fails, this is never executed
    except mechanize.HTTPError as e:
        print e.code            
        tried += 1        
        if tried > 4:
            exit() 
        sleep(30)

    except mechanize.URLError as e:
        print e.reason.args            
        tried += 1
        if tried > 4:
            exit()        
        sleep(30)

 #Do stuff

0人赞添加讨论(0) 举报

ゆ、 Hurt°

4楼-- · 2019-04-02 10:33

You don't have to repeat things in the except block that you do in either case.

from mechanize import Browser
import time

b = Browser()
tried=0
while True:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
      tried += 1
    if isinstance(e,mechanize.HTTPError):
      print e.code
    else:
      print e.reason.args
    if tried > 4:
      exit()
    sleep(30)
    continue
  break

Also, you may be able to use while not r: depending on what Browser.open returns.

Edit: roadierich showed a more elegant way with

try:
  doSomething()
  break
except:
  ...

Because an error skips to the except block.

0人赞添加讨论(0) 举报

Handling exceptions from urllib2 and mechanize in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间