Python: Disable http_proxy in urllib2

2019-01-23 17:44发布

问题:

I am using a proxy set as an environment variable (export http_proxy=example.com). For one call using urllib2 I need to temporarily disable this, ie. unset the http_proxy. I have tried various methods suggested in the documentation and interwebs, but so far have been unable to unset the proxy. So far I have tried:

# doesn't work
req = urllib2.Request('http://www.google.com')
req.set_proxy(None,None)
urllib2.urlopen(req)

# also doesn't work
urllib.getproxies = lambda x = None: {}

回答1:

The urllib2 documentation suggests the following should work. Is it one of the approaches you have tried?

import urllib2

proxy_handler = urllib2.ProxyHandler({})
opener = urllib2.build_opener(proxy_handler)
page = opener.open('http://www.google.com')


回答2:

You can put this before the code where you want to disable system proxies.

import urllib2
urllib2.getproxies = lambda: {}

Sometimes it's better than creating empty ProxyHandler because it works for external libraries, even if they create their own urllib2 openers.

Also the possible way is temporary disable proxy with contextmanager decorator, but I can't bet on that it will work with multi threads:

import selenium
import urllib2
from contextlib import contextmanager

@contextmanager
def no_proxies():
    orig_getproxies = urllib2.getproxies
    urllib2.getproxies = lambda: {}
    yield
    urllib2.getproxies = orig_getproxies

with no_proxies():
    driver = selenium.webdriver.Ie()
    driver.get("http://google.com")

In this example we prevent python-selenium to use system proxy setting which entails errors like these:

IE and Chrome not working with Selenium2 Python

Unable to run IEDriverServer.exe with proxy set up in IE internet option



回答3:

If you want to avoid using proxy for a known set of sites, you can use the no_proxy environment variable like this:

$ export no_proxy="google.com,stackoverflow.com,mysite.org:8080"

(comma-separated list of hostname suffixes, port can be specified as well)

This should work with both urllib and urllib2.



回答4:

Another way is monkeypatching the socks library like this:

import socks, socket, urllib2
def create_connection(address, timeout=None, source_address=None):
    sock = socks.socksocket()
    sock.connect(address)
    return sock

socks.setdefaultproxy(None, None) # this does ["0.0.0.0"], [0]
socket.socket = socks.socksocket
socket.create_connection = create_connection
print urllib2.urlopen("http://httpbin.org/ip").read()

So, seems that if you set it as 0.0.0.0 at port 0 at least, should avoid using it because the inet_aton() library wouldn't accept 0.0.0.0 as valid IP.

Obviously I've not really checked why what... but, indeed works. The most easy way to check is set first a proxy, fetch a url with any library and try again without set a proxy. You'll get catched by last setted proxy :) unless you "unset" it for the following connections.