We're setting up a Python REST web application. Right now, we're using WSGI, but we might do some changes to that in the future (using Twisted, for example, to improve on scalability or some other feature). I would really like some help regarding what is considered a good architecture for a Web application in Python.
In general our app serves dynamic content, processes a moderate to high level of data from clients, performs pretty high-demand database, network and filesystem calls and should be "easily" scalable (quotes here because if a solution is great but somewhat tough to configure for scalability, it would definitely be thought of as good). We would probably like to evolve this into a highly parallel application in the mid-to-long term. Google App Engine is not an accepted suggestion, mainly because of its cost.
My question is this:
- Is using WSGI a good idea? Should we be looking into something like Twisted instead?
- Should we use Apache as a reverse proxy for our static files?
- Is there some different pattern or architecture that we should consider that I haven't mentioned? (Even if completely obvious).
Any help at all with this would be very appreciated. Thanks a lot!
A WSGI application will be fine this is mostly a backend question and data processing question, in my opinion as that is where more architectural parts come into play. I would look into using Celery ( http://celeryproject.org/ ) for your work distribution and backend scaling. Twisted would be a good choice, but it appears you already have that portion written for use as a WSGI application so I would just extend it with Celery.
I do not know the scope of your project but I would design it with Celery in mind.
I would have my frontend endpoints be the WSGI (because you already have that written) and write the backend to be distributed via messages. Then you would have a pool of backend nodes that would pull messages off of the Celery queue and complete the required work. It would look sort of like:
Apache -> WSGI Containers -> Celery Message Queue -> Celery Workers.
The apache nodes would be behind a load balancer of some kind. This would be a fairly simple architecture to scale and is, if done correctly, fairly reliable. Code for failure in a system like this and you will be fine.
You might consider using gevent and zeromq (or any other "mq", i have experience only with zeromq). Its easy to launch multiple gevent processes and have them talk between themselves with zeromq. You can put them behind load balancer, nginx works ok as load balancer, and you can also use nginx to serve static files.
Also with gevent you can use "low level" web frameworks like Werkzeug and webob, Werkzeug being my personal choice.
Gevent has WSGI server builtin, it is very fast and stable, and werkzeug converts WSGI environment and request data in nice and easy to use objects.
http://www.gevent.org/
http://werkzeug.pocoo.org/
https://github.com/traviscline/gevent-zeromq
Here you can find some nice beginner articles about gevent, zeromq and some other things.
http://blog.pythonisito.com
Interesting read too
https://raw.github.com/strangeloop/2011-slides/master/Lindsay-DistributedGeventZmq.pdf