Python basehttpserver not serving requests properl

2019-04-14 04:00发布

问题:

I'm trying to write down a simple local proxy for javascript: since I need to load some stuff from javascript within a web page, I wrote this simple daemon in python:

import string,cgi,time
from os import curdir, sep
import urllib
import urllib2

from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer


class MyHandler(BaseHTTPRequestHandler):



    def fetchurl(self, url, post, useragent, cookies):
        headers={"User-Agent":useragent, "Cookie":cookies}

        url=urllib.quote_plus(url, ":/?.&-=")
        if post:
            req = urllib2.Request(url,post,headers)
        else:
            req=urllib2.Request(url, None, headers)
        try:
            response=urllib2.urlopen(req)
        except urllib2.URLError, e:
            print "URLERROR: "+str(e)
            return False
        except urllib2.HTTPError, e:
            print "HTTPERROR: "+str(e)
            return False
        else:
            return response.read()


    def do_GET(self):
        if self.path != "/":
            [callback, url, post, useragent, cookies]=self.path[1:].split("%7C")

            print "callback = "+callback
            print "url = "+url
            print "post = "+post
            print "useragent = "+useragent
            print "cookies = "+cookies

            if useragent=="":
                useragent="pyjproxy v. 1.0"

            load=self.fetchurl(url, post, useragent, cookies)

            pack=load.replace("\\", "\\\\").replace("\"", "\\\"").replace("\n",         "\\n").replace("\r", "\\r").replace("\t", "\\t").replace("    </script>", "</scr\"+\"ipt>")
            response=callback+"(\""+pack+"\");"

            if load:
                self.send_response(200)
                self.send_header('Content-type',    'text/javascript')
                self.end_headers()
                self.wfile.write(response)
                self.wfile.close()
                return
            else:
                self.send_error(404,'File Not Found: %s' % self.path)
                return
        else:
            embedscript="function pyjload(datadict){  if(!datadict[\"url\"] ||             !datadict[\"callback\"]){return false;}  if(!datadict[\"post\"])             datadict[\"post\"]=\"\";  if(!datadict[\"useragent\"])     datadict[\"useragent\"]=\"\";  if(!datadict[\"cookies\"])     datadict[\"cookies\"]=\"\";  var oHead =                     document.getElementsByTagName('head').item(0);  var oScript=             document.createElement(\"script\");  oScript.type =         \"text/javascript\";  oScript.src=\"http://localhost:1180/\"+datadict[\"callback\"]+\"%7C\"+datadict[\"url\"]+\"%7C\"+datadict[\"post\"]+\"%7C\"+datadict[\"useragent\"]+\"%7C\"+datadict[\"cookies\"];  oHead.appendChild( oScript);}"
            self.send_response(200)
            self.send_header("Content-type", "text/html")
            self.end_headers()
            self.wfile.write(embedscript)
            self.wfile.close()
            return

def main():
    try:
        server = HTTPServer(('127.0.0.1', 1180), MyHandler)
        print 'started httpserver...'
        server.serve_forever()
    except KeyboardInterrupt:
        print '^C received, shutting down server'
        server.socket.close()

if __name__ == '__main__':
    main()

And I use within a web page like this one:

<!DOCTYPE HTML>
<html><head>

<script>
function miocallback(htmlsource)
{
  alert(htmlsource);
}


</script>

<script type="text/javascript" src="http://localhost:1180"></script>


</head><body>


<a onclick="pyjload({'url':'http://www.google.it','callback':'miocallback'});"> Take     the Red Pill</a>

</body></html>

Now, on Firefox and Chrome looks like it works always. On Opera and Internet Explorer, however, I noticed that sometimes it doesn't work, or it hangs for a lot of time... what's up, I wonder? Did I misdo something?

Thank for any help! Matteo

回答1:

You have to understand that (modern) browsers try to optimize their browsing speed using different techniques, which is why you get different results on different browsers.

In your case, the technique that caused you trouble is concurrent HTTP/1.1 session setup: in order to utilize your bandwidth better, your browser is able to start several HTTP/1.1 sessions at the same time. This allows to retrieve multiple resources (e.g. images) simultaneously.

However, BaseHTTPServer is not threaded: as soon as your browser tries to open another connection, it will fail to do so because BaseHTTPServer is already blocked by the first session that's still open. The request will never reach the server and run into a timeout. This also means that only one user can access your service at a given time. Inconvenient? Aye, but help is here:

Threads! .. and python makes this one rather easy:

Derive a new class from HTTPServer using a MixIn from socketserver.

.

Example:

from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler
from SocketServer import ThreadingMixIn
import threading

class Handler(BaseHTTPRequestHandler):

    def do_HEAD(self):
        pass

    def do_GET(self):
        pass


class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
    """ This class allows to handle requests in separated threads.
        No further content needed, don't touch this. """

if __name__ == '__main__':
server = ThreadedHTTPServer(('localhost', 80), Handler)
print 'Starting server on port 80...'
server.serve_forever()

From now on, BaseHTTPServer is threaded and ready to serve multiple connections ( and therefore requests ) at the same time which will solve your problem.

Instead of the ThreadingMixIn, you can also use the ForkingMixIn in order to spawn another process instead of another thread.

all the best,

creo



回答2:

Note that Python basehttpserver is a very basic HTTP server far to be perfect, but that's not your first issue.

What is happening if you put the two scripts at the end of the document just before the </body> tag? Does it help?