I've got a web-application which is built with Pyramid/SQLAlchemy/Postgresql and allows users to manage some data, and that data is almost completely independent for different users. Say, Alice visits alice.domain.com
and is able to upload pictures and documents, and Bob visits bob.domain.com
and is also able to upload pictures and documents. Alice never sees anything created by Bob and vice versa (this is a simplified example, there may be a lot of data in multiple tables really, but the idea is the same).
Now, the most straightforward option to organize the data in the DB backend is to use a single database, where each table (pictures
and documents
) has user_id
field, so, basically, to get all Alice's pictures, I can do something like
user_id = _figure_out_user_id_from_domain_name(request)
pictures = session.query(Picture).filter(Picture.user_id==user_id).all()
This is all easy and simple, however there are some disadvantages
- I need to remember to always use additional filter condition when making queries, otherwise Alice may see Bob's pictures;
- If there are many users the tables may grow huge
- It may be tricky to split the web application between multiple machines
So I'm thinking it would be really nice to somehow split the data per-user. I can think of two approaches:
Have separate tables for Alice's and Bob's pictures and documents within the same database (Postgres' Schemas seems to be a correct approach to use in this case):
documents_alice documents_bob pictures_alice pictures_bob
and then, using some dark magic, "route" all queries to one or to the other table according to the current request's domain:
_use_dark_magic_to_configure_sqlalchemy('alice.domain.com') pictures = session.query(Picture).all() # selects all Alice's pictures from "pictures_alice" table ... _use_dark_magic_to_configure_sqlalchemy('bob.domain.com') pictures = session.query(Picture).all() # selects all Bob's pictures from "pictures_bob" table
Use a separate database for each user:
- database_alice - pictures - documents - database_bob - pictures - documents
which seems like the cleanest solution, but I'm not sure if multiple database connections would require much more RAM and other resources, limiting the number of possible "tenants".
So, the question is, does it all make sense? If yes, how do I configure SQLAlchemy to either modify the table names dynamically on each HTTP request (for option 1) or to maintain a pool of connections to different databases and use the correct connection for each request (for option 2)?