Postgresql: is it better using multiple databases

2019-01-09 23:20发布

问题:

After this comment to one of my question, I'm thinking if it is better using 1 database with X schemas or vice versa.

My situation: I'm developing a web-app where, when people register, I create (actually) a database (no, its not a social network: everyone must have access to his own data and never see the data of the other user).

That's the way I used for the previous version of my application (that is still running on mysql): through the plesk api, for every registration, I do:

  1. Create a database user with limited privileges;
  2. Create a database that can be accessed just by the previous created user and the superuser (for maintenance)
  3. Populate the db

Now, I'll need to do the same with postgresql (the project is getting mature and mysql.. don't fulfill all the needs)

I need to have all the databases/schemas backups independent: pg_dump works perfectly in both ways, the same for the users that can be configured to access just 1 schema or 1 database.

So, assuming you are more experienced potsgres users than me, what do you think is the best solution for my situation, and why?

Will there be performance differences using $x db instead of $x schemas? And what solution will be better to maintain in future (reliability)?

Edit: I almost forgot: all of my databases/schemas will always have the same structure!

Edit2: For the backups issue (using pg_dump), is maybe better using 1 db and many schemas, dumping all the schemas at once: recovering will be quite simple loading the main dump in a dev machine and then dump and restore just the schema needed: there is 1 additional step, but dumping all the schema seem faster then dumpin them one by one.

p.s: sorry if i forgot some 'W' char in the text, my keyboard suffer that button ;)

UPDATE 2012

Well, the application structure and design are changed so much dirung those last two years. Im still using the 1 db with many schemas approach, but still, I have 1 database for each version of my application:

Db myapp_01
    \_ my_customer_foo_schema
    \_ my_customer_bar_schema
Db myapp_02
    \_ my_customer_foo_schema
    \_ my_customer_bar_schema

For backups, im dumping each database regularly, then moving the backups on the dev server.

Im also using the PITR/WAL backup but, as I said before, its not likely i'll have to restore all database at once.. so it will probably be dismissed this year (in my situation is not the best approach).

The 1-db-many-schema approach worked very well for me since now, even if the app structure is totally changed:

i almost forgot: all of my databases/schemas will always have the same structure!

...now, every schema has its own structure that change dinamycally reacting to users data flow.

回答1:

A PostgreSQL "schema" is roughly the same as a MySQL "database". Having many databases on a PostgreSQL installation can get problematic; having many schemas will work with no trouble. So you definitely want to go with one database and multiple schemas within that database.



回答2:

Definitely, I'll go for the 1-db-many-schemas approach. This allows me to dump all the database but restore just 1 very easily, in many ways:

  1. Dump the db (all the schema), load the dump in a new db, dump just the schema i need, and restore back in main db
  2. Dump the schema separately, one by one (but I think the machine will suffer more this way - and I'm expecting like 500 schemas!)

Otherwise, googling around I've seen that there is no auto-procedure to duplicate a schema (using one as a template), but many suggest this way:

  1. Create a template-schema
  2. When need to duplicate, rename it with new name
  3. Dump it
  4. Rename it back
  5. Restore the dump
  6. The magic is done.

I've written 2 rows in python to do that; I hope they can help someone (in-2-seconds-written-code, don’t use it in production):

import os
import sys
import pg

#Take the new schema name from the second cmd arguments (the first is the filename)
newSchema = sys.argv[1]
#Temp folder for the dumps
dumpFile = '/test/dumps/' + str(newSchema) + '.sql'
#Settings
db_name = 'db_name'
db_user = 'db_user'
db_pass = 'db_pass'
schema_as_template = 'schema_name'

#Connection
pgConnect = pg.connect(dbname= db_name, host='localhost', user= db_user, passwd= db_pass)
#Rename schema with the new name
pgConnect.query("ALTER SCHEMA " + schema_as_template + " RENAME TO " + str(newSchema))
#Dump it
command = 'export PGPASSWORD="' + db_pass + '" && pg_dump -U ' + db_user + ' -n ' + str(newSchema) + ' ' + db_name + ' > ' + dumpFile
os.system(command)
#Rename back with its default name
pgConnect.query("ALTER SCHEMA " + str(newSchema) + " RENAME TO " + schema_as_template)
#Restore the previus dump to create the new schema
restore = 'export PGPASSWORD="' + db_pass + '" && psql -U ' + db_user + ' -d ' + db_name + ' < ' + dumpFile
os.system(restore)
#Want to delete the dump file?
os.remove(dumpFile)
#Close connection
pgConnect.close()


回答3:

I would say, go with multiple databases AND multiple schemas :)

Schemas in postgres are a lot like packages in Oracle, in case you are familiar with those. Databases are meant to differentiate between entire sets of data, while schemas are more like data entities.

For instance, you could have one database for an entire application with the schemas "UserManagement", "LongTermStorage" and so on. "UserManagement" would then contain the "User" table, as well as all stored procedures, triggers, sequences etc. that are needed for the user management.

Databases are entire programs, schemas are components.



回答4:

A number of schemas should be more lightweight than a number of databases, although I cannot find a reference which confirms this.

But if you really want to keep things very separate (instead of refactoring the web application so that a "costomer" column is added to your tables), you may still want to use separate databases: I assert that you can more easily make restores of a particular customer's database this way -- without disturbing the other customers.



回答5:

In a postgres-context I recommend to use one db with multiple schemas, as you can (e.g.) UNION ALL across schemas but not across databases. For that reason, a database is really completely insulated from another database whilst schemas are not insulated from other schemas within the same database. If you -for some reason- have to consolidate data across schemas in the future, it will be easy to do this over multiple schemas. With multiple databases you would need multiple db-connections and collect and merge the data from each database "manually" by application logic.

The latter have advantages in some cases, but for the major part I think the one-database-multiple-schemas approach is more useful.



回答6:

Get the things clear First most the time you would like to make Some Db read Only and Some read/write So keep schema used as Read only can be kept on diff Db And read/write Schema in Diff database although i would suggest you to keep MAX 25-30 schema in one DB as you don't want to create a load on the database for logs for all schema

Here is one article if u want to read more