I'm trying to create a simple web filtering app in python. The way I want to do this is to monitor traffic on ports tcp 80/443 (http) and if there is traffic, I want to check something before I let it go through. If it fails the check, I would like for the user to be redirected to a page of my choosing.
So my question is, when the user visits http://www.google.com in the browser, is there a way that I can intercept that request, and is there a way I can redirect them to another page by my choosing?
You need to write a web proxy, and set your web clients proxy server to http://localhost:8000/ (or whatever the proxy is listening on).
Your web client will then send HTTP like this:
GET http://www.google.com
to your proxy which it must then rewrite as:
GET /
and send on to www.google.com, getting the response and then sending it back on the original socket to the client. Note that explanation is massively simplified.
Anyway, its all standard stuff and I suspect Python web proxies exist already for you to hack on.
Edit: http://proxies.xhaus.com/python/
This is from a blog post I wrote a while back. using webob and paste. TransparentProxy forwards the request to whatever url the request specifies. You can write middleware to do something with the request before it gets handed off to the transparentproxy.
Then just set your browsers proxy settings to whatever address your proxy is running on.
this example prints the request and the response, for your case, you want to check the response status for a 404 or 302 or whatever and dispatch to code you write.
from webob.dec import wsgify
from paste import httpserver
from paste.proxy import TransparentProxy
def print_trip(request, response):
"""
just prints the request and response
"""
print "Request\n==========\n\n"
print str(request)
print "\n\n"
print "Response\n==========\n\n"
print str(response)
print "\n\n"
class HTTPMiddleware(object):
"""
serializes every request and response
"""
def __init__(self, app, record_func=print_trip):
self._app = app
self._record = record_func
@wsgify
def __call__(self, req):
result = req.get_response(self._app)
try:
self._record(req.copy(), result.copy())
except Exception, ex: #return response at all costs
print ex
return result
httpserver.serve(HTTPMiddleware(TransparentProxy()), "0.0.0.0", port=8088)
edit:
Here's an example of middleware I wrote so I could intercept a path and return a different response. I use this to test a javascript heavy application that is hardcoded for production, i intercept the config.js and output my own which has unittest specific settings.
class FileIntercept(object):
"""
wsgi: middleware
given request.path will call wsgi app matching that path instead
of dispatching to the wrapped application
"""
def __init__(self, app, file_intercept={}):
self._app = app
self._f = file_intercept
def __call__(self, environ, start_response):
request = Request(environ)
if request.path.lower() in self._f:
response = request.get_response(self._f[request.path.lower()])
else:
response = request.get_response(self._app)
return response(environ, start_response)
and as an example I would initialize it like so....
app = FileIntercept(TransparentProxy(),
file_intercept={"/js/config.js":Response("/*new settings*/")})
httpserver.serve(HTTPMiddleware(app), "0.0.0.0", port=8088)
If it's a specific website, like google.com, you could always poision the hosts file. It would be a ugly but simple solution.
If it's a go, it's located in :
C:/windows/system32/drivers/hosts.txt
It's also in etc
on linux, not to sure were though...