可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This question already has an answer here:

Python proxy.. A simple one! [closed] 2 answers

I have looked everywhere and found millions of python proxy servers but none do precisely what i would like (i think :s)

I have had quite a bit of experience with python generally, but i'm quite new to the world of the deep dark secrets of the HTTP protocol.

What i think might be useful would be a very simple proxy example that can be connected to and will then itself try to connect to the address passed to it.

Also, i think what has been confusing me is everything the hidden stuff is doing, e.g. if the class inherits from BaseHTTPServer.BaseHTTPRequestHandler what precisely happens when a page is requested, as in many of the examples i have found there is no reference to path variable then suddenly poof! self.path is used in a function. im assuming it's been inherited, but how does it end up with the path used?

im sorry if that didn't make much sense, as my idea of my problem is probably scrambled :(

if you can think of anything which would make my question clearer please, please suggest i add it. xxx

Edit:

Also, a link to an explaination of the detailed processes through which the proxy handles the request, requests the page (how to read/modify the data at this point) and passes it to the original requester would be greatly appreciated xxxx

回答1:

"a very simple proxy example that can be connected to and will then itself try to connect to the address passed to it." That is practically the definition of an HTTP proxy.

There's a really simple proxy example here: http://effbot.org/librarybook/simplehttpserver.htm

The core of it is just 3 lines:

class Proxy(SimpleHTTPServer.SimpleHTTPRequestHandler):
    def do_GET(self):
        self.copyfile(urllib.urlopen(self.path), self.wfile)

So it's a SimpleHTTPRequestHandler that, in response to a GET request, opens the URL in the path (a request to a proxy typically looks like "GET http://example.com/", not like "GET /index.html"). It then just copies whatever it can read from that URL to the response.

Notet that this is really minimal. It doesn't deal with headers at all, I believe.

BTW: path is documented at http://docs.python.org/library/basehttpserver.html. It was set before your do* method was called.

回答2:

From the twisted Wiki

from twisted.web import proxy, http
from twisted.internet import reactor
from twisted.python import log
import sys
log.startLogging(sys.stdout)

class ProxyFactory(http.HTTPFactory):
    protocol = proxy.Proxy

reactor.listenTCP(8080, ProxyFactory())
reactor.run()