How can I download a webpage with a user agent other than the default one on urllib2.urlopen?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Try this :
Or, a bit shorter:
All these should work in theory, but (with Python 2.7.2 on Windows at least) any time you send a custom User-agent header, urllib2 doesn't send that header. If you don't try to send a User-agent header, it sends the default Python / urllib2
None of these methods seem to work for adding User-agent but they work for other headers:
there are two properties of
urllib.URLopener()
namely:addheaders = [('User-Agent', 'Python-urllib/1.17'), ('Accept', '*/*')]
andversion = 'Python-urllib/1.17'
.To fool the website you need to changes both of these values to an accepted User-Agent. for e.g.
Chrome browser :
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36'
Google bot :
'Googlebot/2.1'
like this
changing just one property does not work because the website marks it as a suspicious request.
I answered a similar question a couple weeks ago.
There is example code in that question, but basically you can do something like this: (Note the capitalization of
User-Agent
as of RFC 2616, section 14.43.)For python 3, urllib is split into 3 modules...