Using urllib2 with SOCKS proxy

2019-01-08 21:39发布

问题:

Is it possible to fetch pages with urllib2 through a SOCKS proxy on a one socks server per opener basic? I've seen the solution using setdefaultproxy method, but I need to have different socks in different openers.

So there is SocksiPy library, which works great, but it has to be used this way:

import socks
import socket
socket.socket = socks.socksocket
import urllib2
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)

That is, it sets the same proxy for ALL urllib2 requests. How can I have different proxies for different openers?

回答1:

Try with pycurl:

import pycurl
c1 = pycurl.Curl()
c1.setopt(pycurl.URL, 'http://www.google.com')
c1.setopt(pycurl.PROXY, 'localhost')
c1.setopt(pycurl.PROXYPORT, 8080)
c1.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c2 = pycurl.Curl()
c2.setopt(pycurl.URL, 'http://www.yahoo.com')
c2.setopt(pycurl.PROXY, 'localhost')
c2.setopt(pycurl.PROXYPORT, 8081)
c2.setopt(pycurl.PROXYTYPE, pycurl.PROXYTYPE_SOCKS5)

c1.perform() 
c2.perform() 


回答2:

Yes, you can. I repeat my answer on How can I use a SOCKS 4/5 proxy with urllib2? You need to create an opener for every proxy like you do with an http proxy. The code for adding this feature to SocksiPy is available in GitHub https://gist.github.com/869791 and is as simple as:

opener = urllib2.build_opener(SocksiPyHandler(socks.PROXY_TYPE_SOCKS4, 'localhost', 9999))
print opener.open('http://www.whatismyip.com/automation/n09230945.asp').read()

For more information I've written an example running multiple Tor instances to behave like a rotating proxy: Distributed Scraping With Multiple Tor Circuits



回答3:

You have only one socket for all openers and implementing socks is in socket level. So, you can't.
I suggest you to use pycurl library, it much more flexible.



回答4:

== EDIT == (old HTTP-Proxy example was here..)

My fault.. urllib2 has no builtin support for SOCKS proxying..

There are some 'hacks' adding SOCKS to urllib2 (or the socket object in general) here.
But I hardly suspect that this will work with multiple proxies like you require it.

As long as you don't wan't to hook / subclass urllib2.ProxyHandler I would suggest to go with pycurl.



回答5:

You might be able to use threading locks if there aren't too many connections being made at once, and you need to access from multiple threads:

import socks
import socket
import thread
lock = thread.allocate_lock()
socket.socket = socks.socksocket

def GetConn():
    lock.acquire()
    import urllib2
    socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "x.x.x.x", y)
    conn = urllib2.urlopen(ARGUMENTS HERE)
    lock.release()
    return conn

You might also be able to use something like this every time you need to get a connection:

urllib2 = execfile('urllib2.py')
urllib2.socket = dummy_class() # dummy_class needs the socket module's methods

These are obviously not fantastic solutions, but I've put in my 2¢ anyway :-)



回答6:

A cumbersome but working solution for using a SOCKS proxy is to set up provixy with proxy chaining and then set the HTTP_PROXY provided by privoxy via system variable or any other way.



回答7:

You could do you it by setting evironmental variable HTTP_PROXY in following format:

user:pass@proxy:port

or if you use bat/cmd, add before calling script:

set HTTP_PROXY=user:pass@proxy:port

I am using such cmd-file to make easy_install work under proxy.