Should I use a single or multiple database setup f

2019-01-10 00:43发布

问题:

I am working on a PHP application that intends to ease company workflow and project management, let's say something like Basecamp and GoPlan.

I am not sure on what the best approach is, database-wise. Should I use a single database and add client-specific columns to each of the tables, or should I create a database for each new client? An important factor is automation: I want it to be dead simple to create a new client (and perhaps opening the possibility to signing up for yourself).

Possible cons I can think of using one database:

  • Lack of extensibility
  • Security problems (although bugs shouldn't be there in the first place)

What are your thoughts on this? Do you have any ideas what solution the above companies are most likely to have chosen?

回答1:

I usually add ClientID to all tables and go with one database. But since the database is usually hard to scale I will also make it possible to run on different database instances for some or all clients.

That way you can have a bunch of small clients in one database and the big ones on separate servers.

A key factor for maintainability though, is that you keep the schema identical in all databases. There will be headache enough to manage the versioning without introducing client specific schemas.



回答2:

Listen to the Stackoverflow podcast where Joel and Jeff talk about the very same question. Joel is talking about their experience offering a hosted version of their software. He points out that adding client ids all over your DB complicates the design and code (are you sure you didn't accidentally forget to add it to some WHERE clause?) and complicates hosting feature, such as client-specific backups.

It was in episode #20 or #21 (check the transcripts for details).



回答3:

In my view, it will depend on your likely customer base. If you could get into a situation where arch-rivals are both using your system, then you would be better off with separate databases. It also depends on how multiple databases get implemented by your DBMS. If each database has a separate copy of the infrastructure, then that suggests a single database (or a change of DBMS). If multiple databases can be served by a single copy of the infrastructure, then I'd go for separate databases.

Think of database backup. Customer A says "Please send me a copy of my data". Much, much easier in a separate database setup than if a single database is shared. Think of removing a customer; again, much easier with separate databases.

(The 'infrastructure' part is mealy-mouthed because there are major differences between different DBMS about what constitutes a 'database' versus a 'server instance', for example. Add: The question is tagged 'mysql', so maybe those thoughts aren't completely relevant.)

Add: One more issue - with multiple customers in a single database, every SQL query is going to need to ensure that the data for the correct customer is chosen. That means that the SQL is going to be harder to write, and read, and the DBMS is going to have to work harder on processing the data, and indexes will be bigger, and ... I really would go with a separate database per customer for many purposes.

Clearly, StackOverflow (as an example) does not have a separate database per user; we all use the same database. But if you were running accounting systems for different companies, I don't think it would be acceptable (to the companies, and possibly not to the legal people) to share databases.



回答4:

  • DEVELOPMENT For rapid development, use a database per customer. Think how easy it will be to backup, restore, or delete a customer's data. Or to measure/monitor/bill usage. You won't need to write code to do it by yourself, just use your database primitives.

  • PERFORMANCE For performance, use a database for all. Think about connection pooling, shared memory, caching, etc.

  • BUSINESS If your business plan is to have lots of small customers (think hotmail) you should probably work on a single DB. And have all administrative tasks such registration, deletion, data migration, etc. fully automated and exposed in a friendly interface. If you plan to have dozens or up to a few hundreds of big customers then you can work in one DB per customer and have system administration scripts in place that can be operated by your customer support staff.



回答5:

The following screencast explains how it's done on salesforce.com. They use one database with a special column OrgId which identifies each tenant's data. There's much more to that so you should look into this. I'd go with their approach.

There's another great article about that on MSDN. It explains in depth when you should use a shared or isolated approach. Remember that having a shared DB for all your tenants has some important security implications and if all of them share same DB objects you might want to use [row level security] - depending on the DBMS you use (I'm sure it's possible in MS SQL Server and Oracle, probably in IBM DB2 also). You can use tricks like row level security in mySQL to achieve similar results (views + triggers).



回答6:

For multitenancy, performance will typically increase the more resources you manage to share across tenants, see

http://en.wikipedia.org/wiki/Multitenancy

So if you can, go with the single database. I agree that security problems would only occur due to bugs, as you can implement all access control in the application. In some databases, you can still use the database access control by careful use of views (so that each authenticated user gets a different view).

There are ways to provide extensibility also. For example, you could create a single table with extension attributes (keyed by tenant, base record, and extension attribute id). Or you can create per-tenant extension tables, so that each tenant has his own extension schema.



回答7:

When you're designing a multi-tenant database, you generally have three options:

  1. Have one database per tenant
  2. Have one schema per tenant
  3. Have all tenants share the same table(s)

The option you pick has implications on scalability, extensibility and isolation. These implications have been widely discussed across different StackOverflow questions and database articles.

In practice, each of the three design options -with enough effort- can address questions around scale, data that varies across tenants, and isolation. The decision depends on the primary dimension you’re building for. The summary:

  • If you're building for scale: Have all tenants share the same table(s)
  • If you're building for isolation: Create one database per tenant

For example, Google and Salesforce follow the first pattern and have their tenants share the same tables. Stackoverflow on the other hand follows the second pattern and keeps one database per tenant. The second approach is also more commonplace in regulated industries, such as healthcare.

The decision comes down to the primary dimension you're optimizing your database design for. This article on designing your SaaS database for scale talks about the trade-offs and provides a summary in the context of PostgreSQL.



回答8:

Another point to consider is that you may have a legal obligation to keep one companies' data separate from anothers'.



回答9:

Having a database per client generally does not scale well. MySQL (and probably other databases) holds resources open per table, this does not lend itself well to 10k+ tables on one instance, which would happen in a large-scale multitenancy situation.

Of course, if you have some other issue which causes other problems before you get to this level, this may not be relevant.

Additionally, "sharding" a multi-tenant application is likely€ to be the right thing to do eventually as your application gets bigger and bigger.

Sharding does not however mean one database (or instance) per tenant, but one per shard or set of shards, which may have several tenants each. You will need to discover the right tuning parameters for yourself, probably in production (hence it probably needs to be pretty tunable from the outset)

€ I can't guarantee it.



回答10:

You can start with a single database and partition it as the application grows. If you do this, there a few things I would recommend:

1) Design the database in a way that it can be easily partitioned. For example, if customers are going to share data, make sure that data is easily replicated across each database.

2) When you have only one database, make sure it is being backed up to another physical server. In the event of a failover you can revert traffic to this other server and still have your data intact.