I have an expensive function to include in my Tornado app. The function
returns several outputs but, for legacy reason, these outputs are accessed
separately through different handlers.
Is there a way to execute the function only once, re-use the result for the
different handlers and preserve Tornado's asynchronous behavior?
from tornado.web import RequestHandler
from tonado.ioloop import IOLoop
# the expensive function
def add(x, y):
z = x + y
return x, y, z
# the handlers that reuse the function
class Get_X(RequestHandler):
def get(self, x, y):
x, y, z = add(x, y)
return x
class Get_Y(RequestHandler):
def get(self, x, y):
x, y, z = add(x, y)
return y
class Get_Z(RequestHandler):
def get(self, x, y):
x, y, z = add(x, y)
return z
# the web service
application = tornado.web.Application([
(r'/Get_X', Get_X),
(r'/Get_Y', Get_Y),
(r'/Get_Z', Get_Z),
])
application.listen(8888)
IOLoop.current().start()
I thought about about caching the result of the function in a dictionary, but I'm not sure on how to make the two other handlers wait, while the first one creates a dictionary entry.
Tornado Futures
are reusable, so you can simply save the Future
before yielding it. Many off-the-shelf caching decorators (like python 3.2's functools.lru_cache
will just work if you put them in front of @gen.coroutine
:
import functools
from tornado import gen
from tornado.ioloop import IOLoop
@functools.lru_cache(maxsize=100)
@gen.coroutine
def expensive_function():
print('starting expensive_function')
yield gen.sleep(5)
return 1, 2, 3
@gen.coroutine
def get_x():
print('starting get_x')
x, y, z = yield expensive_function()
return x
@gen.coroutine
def get_y():
print('starting get_y')
x, y, z = yield expensive_function()
return y
@gen.coroutine
def get_z():
print('starting get_z')
x, y, z = yield expensive_function()
return z
@gen.coroutine
def main():
x, y, z = yield [get_x(), get_y(), get_z()]
print(x, y, z)
if __name__ == '__main__':
IOLoop.current().run_sync(main)
Prints:
starting get_x
starting expensive_function
starting get_y
starting get_z
finished expensive_function
1 2 3
You're concerned about one handler taking time to calculate a value to be placed in the cache, while other handlers wait for the value to appear in the cache.
Tornado 4.2 includes an Event class you can use to coordinate the coroutines that want the cached value. When a handler wants to get a value from the cache, it checks if the cached value is there:
from tornado import locks
class Get_X(RequestHandler):
@gen.coroutine
def get(self, x, y):
key = (x, y, 'Get_X')
if key in cache:
value = cache[key]
if isinstance(value, locks.Event):
# Another coroutine has begun calculating.
yield value.wait()
value = cache[key]
self.write(value)
return
# Calculate the value.
cache[key] = event = locks.Event()
value = calculate(x, y)
cache[key] = value
event.set()
self.write(value)
This code is untested.
In real code, you should wrap calculate
in a try / except that clears the Event from the cache if calculate
fails. Otherwise, all other coroutines will wait forever for the Event to be set.
I assume calculate
returns a string you can pass to self.write
. In your application there might be further processing with the value before you can call self.write
or self.render
.
You should also consider how large your cache might grow: how large are values, and how many distinct keys will there be? You may need a bounded cache that evicts the least-recently-used value; there are plenty of search results for "Python LRU cache", you might try Raymond Hettinger's since he's widely respected.
For a more sophisticated example of RequestHandlers using Events to synchronize around a cache, see my proxy example in the Toro documentation. It's far from a full-featured web proxy, but the example is written to demonstrate a solution to the exact problem you present: how to avoid duplicate work when calculating a value to be placed in the cache.