I am working on a PHP application that intends to ease company workflow and project management, let's say something like Basecamp and GoPlan.
I am not sure on what the best approach is, database-wise. Should I use a single database and add client-specific columns to each of the tables, or should I create a database for each new client? An important factor is automation: I want it to be dead simple to create a new client (and perhaps opening the possibility to signing up for yourself).
Possible cons I can think of using one database:
- Lack of extensibility
- Security problems (although bugs shouldn't be there in the first place)
What are your thoughts on this? Do you have any ideas what solution the above companies are most likely to have chosen?
Another point to consider is that you may have a legal obligation to keep one companies' data separate from anothers'.
You can start with a single database and partition it as the application grows. If you do this, there a few things I would recommend:
1) Design the database in a way that it can be easily partitioned. For example, if customers are going to share data, make sure that data is easily replicated across each database.
2) When you have only one database, make sure it is being backed up to another physical server. In the event of a failover you can revert traffic to this other server and still have your data intact.
For multitenancy, performance will typically increase the more resources you manage to share across tenants, see
http://en.wikipedia.org/wiki/Multitenancy
So if you can, go with the single database. I agree that security problems would only occur due to bugs, as you can implement all access control in the application. In some databases, you can still use the database access control by careful use of views (so that each authenticated user gets a different view).
There are ways to provide extensibility also. For example, you could create a single table with extension attributes (keyed by tenant, base record, and extension attribute id). Or you can create per-tenant extension tables, so that each tenant has his own extension schema.
When you're designing a multi-tenant database, you generally have three options:
The option you pick has implications on scalability, extensibility and isolation. These implications have been widely discussed across different StackOverflow questions and database articles.
In practice, each of the three design options -with enough effort- can address questions around scale, data that varies across tenants, and isolation. The decision depends on the primary dimension you’re building for. The summary:
For example, Google and Salesforce follow the first pattern and have their tenants share the same tables. Stackoverflow on the other hand follows the second pattern and keeps one database per tenant. The second approach is also more commonplace in regulated industries, such as healthcare.
The decision comes down to the primary dimension you're optimizing your database design for. This article on designing your SaaS database for scale talks about the trade-offs and provides a summary in the context of PostgreSQL.
Having a database per client generally does not scale well. MySQL (and probably other databases) holds resources open per table, this does not lend itself well to 10k+ tables on one instance, which would happen in a large-scale multitenancy situation.
Of course, if you have some other issue which causes other problems before you get to this level, this may not be relevant.
Additionally, "sharding" a multi-tenant application is likely€ to be the right thing to do eventually as your application gets bigger and bigger.
Sharding does not however mean one database (or instance) per tenant, but one per shard or set of shards, which may have several tenants each. You will need to discover the right tuning parameters for yourself, probably in production (hence it probably needs to be pretty tunable from the outset)
€ I can't guarantee it.
In my view, it will depend on your likely customer base. If you could get into a situation where arch-rivals are both using your system, then you would be better off with separate databases. It also depends on how multiple databases get implemented by your DBMS. If each database has a separate copy of the infrastructure, then that suggests a single database (or a change of DBMS). If multiple databases can be served by a single copy of the infrastructure, then I'd go for separate databases.
Think of database backup. Customer A says "Please send me a copy of my data". Much, much easier in a separate database setup than if a single database is shared. Think of removing a customer; again, much easier with separate databases.
(The 'infrastructure' part is mealy-mouthed because there are major differences between different DBMS about what constitutes a 'database' versus a 'server instance', for example. Add: The question is tagged 'mysql', so maybe those thoughts aren't completely relevant.)
Add: One more issue - with multiple customers in a single database, every SQL query is going to need to ensure that the data for the correct customer is chosen. That means that the SQL is going to be harder to write, and read, and the DBMS is going to have to work harder on processing the data, and indexes will be bigger, and ... I really would go with a separate database per customer for many purposes.
Clearly, StackOverflow (as an example) does not have a separate database per user; we all use the same database. But if you were running accounting systems for different companies, I don't think it would be acceptable (to the companies, and possibly not to the legal people) to share databases.