Tornado Web and Threads

2019-06-27 11:46发布

问题:

I am new to Tornado and Python Threads. What I would like to achieve is the following: I have a Tornado web server which takes requests from users. I want to store some of the data locally and write it periodically to a data base as bulk-inserts.

import tornado.ioloop
import tornado.web
import threading

# Keep userData locally in memory
UserData = {}

def background(f):
    """
    a threading decorator
    use @background above the function you want to thread
    (run in the background)
    """
    def bg_f(*a, **kw):
        threading.Thread(target=f, args=a, kwargs=kw).start()
    return bg_f

@background
def PostRecentDataToDBThread(iter = -1):
    i = 0
    while iter == -1 or i < iter: 
        #send data to DB
        UserData = {}
        time.sleep(5*60)
        i = i + 1

class AddHandler(tornado.web.RequestHandler):
    def post(self):
        userID = self.get_argument('ui')
        Data = self.get_argument('data')

        UserData[userID] = Data 


if __name__ == "__main__":
    tornado.options.parse_command_line()

    print("start PostRecentDataToDBThread")
    ### Here we start a thread that periodically sends data to the data base.
    ### The thread is called every 5min. 
    PostRecentDataToDBThread(-1)

    print("Started tornado on port: %d" % options.port)

    application = tornado.web.Application([
        (r"/", MainHandler),
        (r"/add", AddHandler)
    ])
    application.listen(options.port)
    tornado.ioloop.IOLoop.instance().start()

Is this a good way to achieve my goal? I would like to minimize the server blocking time. Or should I rather use gevent or anything else? Can I run into problems by accessing UserData both from Tornado and the thread? Data consistency is not so important here as long as there is no server crash.

回答1:

Tornado is not intended to be used with multithreading. It's based on epoll to switch context between different parts of code.

In general I would recommend sending the data to separate worker process via message queue (like pika+RabbitMQ, it integrates very well with Tornado). Worker process(es) can accumulate messages with data and write them to database in batch or you could implement any other logic of data processing with this setup.

Alternatively you can use, for example, Redis with brukva to just asynchronously write incoming data to in-memory database, which in turn will asynchronously dump it to disk depending on Redis configuration.