Using urllib2 via proxy

2019-05-25 16:49发布

问题:

I am trying to use urllib2 through a proxy; however, after trying just about every variation of passing my verification details using urllib2, I either get a request that hangs forever and returns nothing or I get 407 Errors. I can connect to the web fine using my browser which connects to a prox-pac and redirects accordingly; however, I can't seem to do anything via the command line curl, wget, urllib2 etc. even if I use the proxies that the prox-pac redirects to. I tried setting my proxy to all of the proxies from the pac-file using urllib2, none of which work.

My current script looks like this:

import urllib2 as url

proxy = url.ProxyHandler({'http': 'username:password@my.proxy:8080'})
auth = url.HTTPBasicAuthHandler()
opener = url.build_opener(proxy, auth, url.HTTPHandler)
url.install_opener(opener)
url.urlopen("http://www.google.com/")

which throws HTTP Error 407: Proxy Authentication Required and I also tried:

import urllib2 as url

handlePass = url.HTTPPasswordMgrWithDefaultRealm()
handlePass.add_password(None, "http://my.proxy:8080", "username", "password")
auth_handler = url.HTTPBasicAuthHandler(handlePass)
opener = url.build_opener(auth_handler)
url.install_opener(opener)
url.urlopen("http://www.google.com")

which hangs like curl or wget timing out.

What do I need to do to diagnose the problem? How is it possible that I can connect via my browser but not from the command line on the same computer using what would appear to be the same proxy and credentials?

Might it be something to do with the router? if so, how can it distinguish between browser HTTP requests and command line HTTP requests?

回答1:

Frustrations like this are what drove me to use Requests. If you're doing significant amounts of work with urllib2, you really ought to check it out. For example, to do what you wish to do using Requests, you could write:

import requests
from requests.auth import HTTPProxyAuth

proxy = {'http': 'http://my.proxy:8080'}
auth = HTTPProxyAuth('username', 'password')
r = requests.get('http://wwww.google.com/', proxies=proxy, auth=auth)
print r.text

Or you could wrap it in a Session object and every request will automatically use the proxy information (plus it will store & handle cookies automatically!):

s = requests.Session(proxies=proxy, auth=auth)
r = s.get('http://www.google.com/')
print r.text