How to improve the performance of the combination

2019-04-13 21:02发布

问题:

I am trying to use gevent as wsgi server, and use tornado WSGIApplication to process requests. Here's the code

#!/usr/bin/env python
# coding=utf-8

import gevent
from gevent import monkey
monkey.patch_all(thread=False)

from gevent.pywsgi import WSGIServer

from tornado.wsgi import WSGIApplication
import tornado.web
import tornado.wsgi

import requests

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        requests.get('http://google.com')
        self.write('hello')


handlers = [
    (r'/', MainHandler)
]


if __name__ == '__main__':
    application = WSGIApplication(handlers)
    server = WSGIServer(('', 9010), application)
    server.serve_forever()

And I use apache benchmark to test the performance. Test command is

ab -n 1000 -c 100 http://127.0.0.1:9010/

This resulting 100req per second, it is too slow. In the above code, I just put a http request to other side, I think in this situation, gevent will switch to other greenlet when block, and it should has little influence on the performance, but gevent's performance decrease from 1600req per second to 100 req per second, I can't figure out why.

Could anyone explain this?

回答1:

Hi you're issue is that you aren't spawning an actual greenlet and that the tornado.web.ascynhronous decorator does nos support WSGI servers.

But the main logic works and I was able to get it working with a HTTP server (I don't know if you're tied to a WSGI server, but I guess not since you can just reverse proxy just as well).

I find a lot of people wanting to use gevent with tornado, me included (we use tornado and gevent at FriendCode), so I wrote this:

# Gevent monkeypath
from gevent import monkey
monkey.patch_all()

# Gevent imports
import gevent

# Python immports
import functools

# Tornado imports
import tornado.ioloop
import tornado.web
import tornado.httpserver

# Request imports
import requests


# Asynchronous gevent decorator
def gasync(func):
    @tornado.web.asynchronous
    @functools.wraps(func)
    def f(self, *args, **kwargs):
        #self._auto_finish = False
        return gevent.spawn(func, self, *args, **kwargs)
    return f


# Constants
URL_TO_FETCH = 'http://google.co.uk/'

# Global
I = 0


class MainHandler(tornado.web.RequestHandler):
    @gasync
    def get(self):
        global I
        r = requests.get(URL_TO_FETCH)
        I += 1
        print('Got page %d (length=%d)' % (I, len(r.content)))
        self.write("Done")
        self.finish()


# Our URL Mappings
handlers = [
   (r"/", MainHandler),
]


def main():
    # Setup app and HTTP server
    application = tornado.web.Application(handlers)
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(9998)

    # Start ioloop
    tornado.ioloop.IOLoop.instance().start()


if __name__ == "__main__":
    main()

In this sample there are two key parts, the monkey patching part that you got right and then the gasync decorator that I wrote, all it does is set a method to being asynchronous (by tornado's terms, which means that the method has to call self.finish() by itself to send the response to the client, because tornado calls it automatically when the request is synchronous, but it's not what you want in async).

I hope that helps, the code works fine here I've tested it with :

$ ab -n 100 -c 100 http://localhost:9998/

Which gives :

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        TornadoServer/2.3
Server Hostname:        localhost
Server Port:            9998

Document Path:          /
Document Length:        4 bytes

Concurrency Level:      100
Time taken for tests:   0.754 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      15900 bytes
HTML transferred:       400 bytes
Requests per second:    132.67 [#/sec] (mean)
Time per request:       753.773 [ms] (mean)
Time per request:       7.538 [ms] (mean, across all concurrent requests)
Transfer rate:          20.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    4   0.8      4       5
Processing:   379  572 104.4    593     748
Waiting:      379  572 104.4    593     748
Total:        383  576 104.3    596     752

Percentage of the requests served within a certain time (ms)
  50%    596
  66%    640
  75%    672
  80%    679
  90%    707
  95%    722
  98%    735
  99%    752
 100%    752 (longest request)

As you can see the total time there is roughly equal to the time of the longest request, remember that when async :

total_time = max(all_individual_times) + n*some_overhead

Where n is the number of requests, and some_overhead a constant overhead.

Hope that helps :)



回答2:

I had the same need, but i'm working with futures and gen.coroutine, so i had to modify it just a little to be compatible with my code, i'm putting it here if anyone else need it too:

#
# encoding: utf-8

from gevent import monkey
monkey.patch_all()

# Gevent imports
import gevent

# Python immports
import functools

# Tornado imports
import tornado.ioloop
import tornado.web
import tornado.gen
import tornado.httpserver

# Request imports
import requests
from tornado.concurrent import Future


# Asynchronous gevent decorator
def gfuture(func):
    @functools.wraps(func)
    def f(*args, **kwargs):
        loop = tornado.ioloop.IOLoop.current()
        future = Future()

        def call_method():
            try:
                result = func(*args, **kwargs)
                loop.add_callback(functools.partial(future.set_result, result))
            except Exception, e:
                loop.add_callback(functools.partial(future.set_exception, e))
        gevent.spawn(call_method)
        return future
    return f


# Constants
URL_TO_FETCH = 'http://google.com/'

# Global
I = 0


@gfuture
def gfetch(url, i):
    r = requests.get(url)
    return i


class MainHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    @tornado.gen.coroutine
    def get(self):
        global I
        I += 1
        n = I
        print "=> %s" % n
        n = yield gfetch(URL_TO_FETCH, n)
        print "<= %s" % n
        self.write("Done %s" % n)


# Our URL Mappings
handlers = [(r"/", MainHandler)]


def main():
    # Setup app and HTTP server
    application = tornado.web.Application(handlers)
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(9998)

    # Start ioloop
    tornado.ioloop.IOLoop.instance().start()


if __name__ == "__main__":
    main()


回答3:

Try to test requests package only with gevent sample application to chceck if it is truly asynchronous. Maybe the reason is that gevent doesn't patch correctly everything required by requests

But I think that your solution is not asynchronous from gevent site: you are trying to run 2 event loops. I guess the process, you present, looks following:

  • Tornado IOLoop waits for event (for http server)
  • HTTP request is handled
  • Handlers run get method, which runs gevent event loop (gevent starts the event loop implicitly in a dedicated greenlet)
  • gevent's event loop blocks the Tornado greenlet (parent of the greenlet where gevent's event loop lives
  • request's greenlet waits for finish
  • request's greenlet is finished, gevent's event loop is closed, end Tornado IOLoop is released.

Look also at some success combination of Tornado and greenlets: tornalet, Motor.



回答4:

Tornado (Twisted) and gevent do the same things, and for best results you should stay within the same technology stacks and not mix the two. Either use a WSGI web framework with gevent like bottle or flask, or use twisted and tornado together.