Lately I've been rethinking a database design I made a couple of months ago. The main reason is that last night I read the databse schema of vBulletin and saw that they use many, MANY, tables.
The current "idea" I'm using for my schema, for instance my log table, is to keep everything in one table by differencing the type of Log with an integer:
id, type, type_id, action, message
1 , 1, 305, 2, 'Explanation for user Ban'
2, 2, 1045, 1, 'Reason for deletion of Article'
Where type 1 = user, type 2 = article
, type_id = the ID of the user, article or w/e
and action 2 = ban, action 1 = deletion
.
Should I change the design to two tables logBans
, logSomething
and so on? or is it better to keep the method I'm currently using?
It depends. If you're going to have 1500000000 entries of type 1 and 1000 entries of type 2 and you'll be doing a LOT of queries on type 2, separate the tables. If not, it's more convenient to keep only one table.
Keep in mind scalability:
How many entries of each type will I have in 1 year?
How many requests on this table will I be doing ?
Can you, at some point, clear this log? Can you move it to another table (like archive entries older than X months) ?
I wouldn't do what vBulletin does. The problem with older apps like vBulletin is that while they might have started as lean-machines, over the time they collect a lot of entropy and end up being bloated. Since there are plugins, and third-party tools, and developers who've worked on the old code, breaking it is a tough choice.
That's why there is not much refactoring going on here. Don't make them your programming model. Look around, find out what works best and use that. A lot of table sounds like a bad thing to me, not good.
I would keep things as specific as possible - in this case I would create two tables. Each table has a specific purpose so I cannot see why you would combine them.
The issue here is subtyping. There are three basic approaches to dealing with subtypes.
Each strategy has its merits.
For example, (3) is particularly applicable if there is little to no difference between different subtypes. In your case, do different log records have extra columns if they're of a particular type? If they don't or there are few cases when they do putting them all in one table makes perfect sense.
(2) is common used for a Party table. This is a common model in CRMs that involves a parent Party object which has subtypes for Person and Organization (Organization may also have subtypes like Company, Association, etc). Person and Organization have different properties (eg salutation, given names, date of birth, etc for Person) so it makes sense to split this up rather than using nullable columns.
(2) is potentially more space efficient (although the overhead of NULL columns in modern DBMSs is very low). The bigger issue is that (2) might be more confusing to developers. You will get a situation where someone needs to store an extra field somewhere and will whack it in a column that's empty for that type simply because it's easier doing that than getting approval for the DBAs to add a column (no, I'm not kidding).
(1) is probably the least frequently used scheme of the 3 in my experience.
Lastly, scalability has to be considered and is probably the best case for (1). At a certain points JOINs don't scale effectively and you'll need to use some kind of partitioning scheme to cut down your table sizes. (1) is one method of doing that (but a crude method).
I wouldn't worry too much about that though. You'll typically need to get to hundreds of millions or billions of records before that becomes an issue (unless your records are really really large, in which case it'll happen sooner).
The one drawback I see right now is that you cannot enforce foreign key integrity on your type_id since it points to many different tables.
I want to add a small tip. A little off topic, and quite basic, but it's a lot clearer to use
enum
instead oftinyint
for status flags, i.e.If there are only two statuses,
tinyint
is a little more memory efficient, but less clear. Another disadvantage in enum is that you put a part of the business logic in the data tier - when you need to add or remove statuses, you have to alter the DB. Otherwise it's much more clear and I preferenum
.