Handling exceptions from urllib2 and mechanize in

2019-04-02 10:14发布

问题:

I am a novice at using exception handling. I am using the mechanize module to scrape several websites. My program fails frequently because the connection is slow and because the requests timeout. I would like to be able to retry the website (on a timeout, for instance) up to 5 times after 30 second delays between each try.

I looked at this stackoverflow answer and can see how I can handle various exceptions. I also see (although it looks very clumsy) how I can put the try/exception inside a while loop to control the 5 attempts ... but I do not understand how to break out of the loop, or "continue" when the connection is successful and no exception has been thrown.

from mechanize import Browser
import time

b = Browser()
tried=0
while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
    if isinstance(e,mechanize.HTTPError):
      print e.code
      tried += 1
      sleep(30)
      if tried > 4:
        exit()
    else:
      print e.reason.args
      tried += 1
      sleep(30)
      if tried > 4:
        exit()

print "How can I get to here after the first successful b.open() attempt????"

I would appreciate advice about (1) how to break out of the loop on a successful open, and (2) how to make the whole block less clumsy/more elegant.

回答1:

You don't have to repeat things in the except block that you do in either case.

from mechanize import Browser
import time

b = Browser()
tried=0
while True:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
      tried += 1
    if isinstance(e,mechanize.HTTPError):
      print e.code
    else:
      print e.reason.args
    if tried > 4:
      exit()
    sleep(30)
    continue
  break

Also, you may be able to use while not r: depending on what Browser.open returns.

Edit: roadierich showed a more elegant way with

try:
  doSomething()
  break
except:
  ...

Because an error skips to the except block.



回答2:

Your first question can be done with break:

while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
    break
  except #etc...

The real question, however, is do you really want to: this is what is known as "Spaghetti code": if you try to graph execution through the program, it looks like a plate of spaghetti.

The real (imho) problem you are having, is that your logic for exiting the while loop is flawed. Rather than trying to stop after a number of attempts (a condition that never occurs because you're already exiting anyway), loop until you've got a connection:

#imports etc

tried=0
connected = False
while not Connected:
    try:
        r = b.open('http://www.google.com/foobar')
        connected = true # if line above fails, this is never executed
    except mechanize.HTTPError as e:
        print e.code            
        tried += 1        
        if tried > 4:
            exit() 
        sleep(30)

    except mechanize.URLError as e:
        print e.reason.args            
        tried += 1
        if tried > 4:
            exit()        
        sleep(30)

 #Do stuff


回答3:

For your first question, you simply want the "break" keyword, which breaks out of a loop.

For the second question, you can have several "except" clauses for one "try", for different kinds of exceptions. This replaces your isinstance() check and will make your code cleaner.