Python Intercept Web Traffic from Browser

2020-02-28 04:38发布

问题:

I'm trying to create a simple web filtering app in python. The way I want to do this is to monitor traffic on ports tcp 80/443 (http) and if there is traffic, I want to check something before I let it go through. If it fails the check, I would like for the user to be redirected to a page of my choosing.

So my question is, when the user visits http://www.google.com in the browser, is there a way that I can intercept that request, and is there a way I can redirect them to another page by my choosing?

回答1:

You need to write a web proxy, and set your web clients proxy server to http://localhost:8000/ (or whatever the proxy is listening on).

Your web client will then send HTTP like this:

GET http://www.google.com

to your proxy which it must then rewrite as:

GET /

and send on to www.google.com, getting the response and then sending it back on the original socket to the client. Note that explanation is massively simplified.

Anyway, its all standard stuff and I suspect Python web proxies exist already for you to hack on.

Edit: http://proxies.xhaus.com/python/



回答2:

This is from a blog post I wrote a while back. using webob and paste. TransparentProxy forwards the request to whatever url the request specifies. You can write middleware to do something with the request before it gets handed off to the transparentproxy.

Then just set your browsers proxy settings to whatever address your proxy is running on.

this example prints the request and the response, for your case, you want to check the response status for a 404 or 302 or whatever and dispatch to code you write.

from webob.dec import wsgify
from paste import httpserver
from paste.proxy import TransparentProxy


def print_trip(request, response):
    """
    just prints the request and response
    """
    print "Request\n==========\n\n"
    print str(request)
    print "\n\n"
    print "Response\n==========\n\n"
    print str(response)
    print "\n\n"


class HTTPMiddleware(object):
    """
    serializes every request and response
    """

    def __init__(self, app, record_func=print_trip):
        self._app = app
        self._record = record_func

    @wsgify
    def __call__(self, req):
        result = req.get_response(self._app)
        try:
            self._record(req.copy(), result.copy())
        except Exception, ex: #return response at all costs
            print ex
        return result

httpserver.serve(HTTPMiddleware(TransparentProxy()), "0.0.0.0", port=8088)

edit:

Here's an example of middleware I wrote so I could intercept a path and return a different response. I use this to test a javascript heavy application that is hardcoded for production, i intercept the config.js and output my own which has unittest specific settings.

class FileIntercept(object):
    """
    wsgi: middleware
    given request.path will call wsgi app matching that path instead
    of dispatching to the wrapped application
    """
    def __init__(self, app, file_intercept={}):
        self._app = app
        self._f = file_intercept

    def __call__(self, environ, start_response):
        request = Request(environ)
        if request.path.lower() in self._f:
            response = request.get_response(self._f[request.path.lower()])
        else:
            response = request.get_response(self._app)
        return response(environ, start_response)

and as an example I would initialize it like so....

 app = FileIntercept(TransparentProxy(),
                             file_intercept={"/js/config.js":Response("/*new settings*/")})
 httpserver.serve(HTTPMiddleware(app), "0.0.0.0", port=8088)


回答3:

If it's a specific website, like google.com, you could always poision the hosts file. It would be a ugly but simple solution.

If it's a go, it's located in :

C:/windows/system32/drivers/hosts.txt

It's also in etc on linux, not to sure were though...