My application is running on Google App Engine and most of requests constantly gets yellow flag due to high CPU usage. Using profiler I tracked the issue down to the routine of creating jinja2.Environment
instance.
I'm creating the instance at module level:
from jinja2 import Environment, FileSystemLoader
jinja_env = Environment(loader=FileSystemLoader(TEMPLATE_DIRS))
Due to the Google AppEngine operation mode (CGI), this code can be run upon each and every request (their module import cache seems to cache modules for seconds rather than for minutes).
I was thinking about storing the environment instance in memcache, but it seems to be not picklable. FileSystemLoader
instance seems to be picklable and can be cached, but I did not observe any substantial improvement in CPU usage with this approach.
Anybody can suggest a way to decrease the overhead of creating jinja2.Environment
instance?
Edit: below is (relevant) part of profiler output.
222172 function calls (215262 primitive calls) in 8.695 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
33 1.073 0.033 1.083 0.033 {google3.apphosting.runtime._apphosting_runtime___python__apiproxy.Wait}
438/111 0.944 0.002 2.009 0.018 /base/python_dist/lib/python2.5/sre_parse.py:385(_parse)
4218 0.655 0.000 1.002 0.000 /base/python_dist/lib/python2.5/pickle.py:1166(load_long_binput)
1 0.611 0.611 0.679 0.679 /base/data/home/apps/with-the-flow/1.331879498764931274/jinja2/environment.py:10()
One call, but as far I can see (and this is consistent across all my GAE-based apps), the most expensive in the whole request processing cycle.
According to this google recipe you can use memcache to cache bytecodes. You can also cache the template file content itself. All in the same recipe
OK, people, this is what I got today on #pocoo:
[20:59] zgoda: hello, i'd like to know if i could optimize my jinja2 environment creation process, the problem -> Optimizing Jinja2 Environment creation
[21:00] zgoda: i have profiler output from "cold" app -> http://paste.pocoo.org/show/107009/
[21:01] zgoda: and for "hot" -> http://paste.pocoo.org/show/107014/
[21:02] zgoda: i'm wondering if i could somewhat lower the CPU cost of creating environment for "cold" requests
[21:05] mitsuhiko: zgoda: put the env creation into a module that you import
[21:05] mitsuhiko: like
[21:05] mitsuhiko: from yourapplication.utils import env
[21:05] zgoda: it's already there
[21:06] mitsuhiko: hmm
[21:06] mitsuhiko: i think the problem is that the template are re-compiled each access
[21:06] mitsuhiko: unfortunately gae is incredible limited, i don't know if there is much i can do currently
[21:07] zgoda: i tried with jinja bytecache but it does not work on prod (its on on dev server)
[21:08] mitsuhiko: i know
[21:08] mitsuhiko: appengine does not have marshal
[21:12] zgoda: mitsuhiko: thank you
[21:13] zgoda: i was hoping i'm doing something wrong and this can be optimized...
[21:13] mitsuhiko: zgoda: next release will come with improved appengine support, but i'm not sure yet how to implement improved caching for ae
It looks Armin is aware of problems with bytecode caching on AppEngine and has some plans to improve Jinja2 to allow caching on GAE. I hope things will get better over time.
Armin suggested to pre-compile Jinja2 templates to python code, and use the compiled templates in production. So I've made a compiler/loader for that, and it now renders some complex templates 13 times faster, throwing away all the parsing overhead. The related discussion with link to the repository is here.