Handling exceptions from urllib2 and mechanize in

2019-04-02 09:47发布

I am a novice at using exception handling. I am using the mechanize module to scrape several websites. My program fails frequently because the connection is slow and because the requests timeout. I would like to be able to retry the website (on a timeout, for instance) up to 5 times after 30 second delays between each try.

I looked at this stackoverflow answer and can see how I can handle various exceptions. I also see (although it looks very clumsy) how I can put the try/exception inside a while loop to control the 5 attempts ... but I do not understand how to break out of the loop, or "continue" when the connection is successful and no exception has been thrown.

from mechanize import Browser
import time

b = Browser()
tried=0
while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
    if isinstance(e,mechanize.HTTPError):
      print e.code
      tried += 1
      sleep(30)
      if tried > 4:
        exit()
    else:
      print e.reason.args
      tried += 1
      sleep(30)
      if tried > 4:
        exit()

print "How can I get to here after the first successful b.open() attempt????"

I would appreciate advice about (1) how to break out of the loop on a successful open, and (2) how to make the whole block less clumsy/more elegant.

3条回答
狗以群分
2楼-- · 2019-04-02 10:14

For your first question, you simply want the "break" keyword, which breaks out of a loop.

For the second question, you can have several "except" clauses for one "try", for different kinds of exceptions. This replaces your isinstance() check and will make your code cleaner.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-04-02 10:20

Your first question can be done with break:

while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
    break
  except #etc...

The real question, however, is do you really want to: this is what is known as "Spaghetti code": if you try to graph execution through the program, it looks like a plate of spaghetti.

The real (imho) problem you are having, is that your logic for exiting the while loop is flawed. Rather than trying to stop after a number of attempts (a condition that never occurs because you're already exiting anyway), loop until you've got a connection:

#imports etc

tried=0
connected = False
while not Connected:
    try:
        r = b.open('http://www.google.com/foobar')
        connected = true # if line above fails, this is never executed
    except mechanize.HTTPError as e:
        print e.code            
        tried += 1        
        if tried > 4:
            exit() 
        sleep(30)

    except mechanize.URLError as e:
        print e.reason.args            
        tried += 1
        if tried > 4:
            exit()        
        sleep(30)

 #Do stuff
查看更多
ゆ 、 Hurt°
4楼-- · 2019-04-02 10:33

You don't have to repeat things in the except block that you do in either case.

from mechanize import Browser
import time

b = Browser()
tried=0
while True:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
      tried += 1
    if isinstance(e,mechanize.HTTPError):
      print e.code
    else:
      print e.reason.args
    if tried > 4:
      exit()
    sleep(30)
    continue
  break

Also, you may be able to use while not r: depending on what Browser.open returns.

Edit: roadierich showed a more elegant way with

try:
  doSomething()
  break
except:
  ...

Because an error skips to the except block.

查看更多
登录 后发表回答