Python urllib freezes with specific URL

2019-07-07 17:07发布

问题:

I am trying to fetch a page and urlopen hangs and never returns anything, although the web page is very light and can be opened with any browser without any problems

import urllib.request
with urllib.request.urlopen("http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm") as response:
    print(response.read())

This simple code just freezes while retrieving the response, but if you try to open http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm it opens without any problem

回答1:

www.planalto.gov.br is using user-agent detection. If you specify a valid user-agent, the request fulfills correctly. The urllib library didn't crash, it's just waiting.

curl -H "User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm

worked just fine for me but

curl http://www.planalto.gov.br/ccivil_03/_Ato2007-2010/2008/Lei/L11882.htm

did not.

Like RPGillespie said above, use urllib2 or requests to add the user-agent header (see How do I set headers using python's urllib? for more information about that).