I inherited a database built with the idea that composite keys are much more ideal than using a unique object ID field and that when building a database, a single unique ID should never be used as a primary key. Because I was building a Rails front-end for this database, I ran into difficulties getting it to conform to the Rails conventions (though it was possible using custom views and a few additional gems to handle composite keys).
The reasoning behind this specific schema design from the person who wrote it had to do with how the database handles ID fields in a non-efficient manner and when it's building indexes, tree sorts are flawed. This explanation lacked any depth and I'm still trying to wrap my head around the concept (I'm familiar with using composite keys, but not 100% of the time).
Can anyone offer opinions or add any greater depth to this topic?
I've been developing database applications for 15 years and I have yet to come across a case where a non-surrogate key was a better choice than a surrogate key.
I'm not saying that such a case does not exist, I'm just saying when you factor in the practical issues of actually developing an application that accesses the database, usually the benefits of a surrogate key start to overwhelm the theoretical purity of non-surrogate keys.
the primary key should be constant and meaningless; non-surrogate keys usually fail one or both requirements, eventually
if the key is not constant, you have a future update issue that can get quite complicated
if the key is not meaningless, then it is more likely to change, i.e. not be constant; see above
take a simple, common example: a table of Inventory items. It may be tempting to make the item number (sku number, barcode, part code, or whatever) the primary key, but then a year later all the item numbers change and you're left with a very messy update-the-whole-database problem...
EDIT: there's an additional issue that is more practical than philosophical. In many cases you're going to find a particular row somehow, then later update it or find it again (or both). With composite keys there is more data to keep track of and more contraints in the WHERE clause for the re-find or update (or delete). It is also possible that one of the key segments may have changed in the meantime!. With a surrogate key, there is always only one value to retain (the surrogate ID) and by definition it cannot change, which simplifies the situation significantly.
using natural keys makes a nightmare using any automatic ORM as persistence layer. Also, foreign keys on multiple column tend to overlap one another and this will give further problem when navigating and updating the relationship in a OO way.
Still you could transform the natural key in an unique constrain and add an auto generated id; this doesn't remove the problem with the foreign keys, though, those will have to be changed by hand; hopefully multiple columns and overlapping constraints will be a minority of all the relationship, so you could concentrate on refactoring where it matter most.
natural pk have their motivation and usages scenario and are not a bad thing(tm), they just tend to not get along well with ORM.
my feeling is that as any other concepts, natural keys and table normalization should be used when sensible and not as blind design constraints
I am not a experienced one but still i m in favor of Using primary key as id here is the explanation using an example..
The format of external data may change over time. For example, you might think that the ISBN of a book would make a good primary key in a table of books. After all, ISBNs are unique. But as this particular book is being written, the publishing industry in the United States is gearing up for a major change as additional digits are added to all ISBNs. If we’d used the ISBN as the primary key in a table of books, we’d have to update each row to reflect this change. But then we’d have another problem. There’ll be other tables in the database that reference rows in the books table via the primary key. We can’t change the key in the books table unless we first go through and update all of these references. And that will involve dropping foreign key constraints, updating tables, updating the books table, and finally reestablishing the constraints. All in all, this is something of a pain. The problems go away if we use our own internal value as a primary key. No third party can come along and arbitrarily tell us to change our schema—we control our own keyspace. And if something such as the ISBN does need to change, it can change without affecting any of the existing relationships in the database. In effect, we’ve decoupled the knitting together of rows from the external representation of data in those rows.
Although the explanation is quite bookish but i think it explains the things in a simpler way.
It sounds like the person who created the database is on the natural keys side of the great natural keys vs. surrogate keys debate.
I've never heard of any problems with btrees on ID fields, but I also haven't studied it in any great depth...
I fall on the surrogate key side: You have less repetition when using a surrogate key, because you're only repeating a single value in the other tables. Since humans rarely join tables by hand, we don't care if it's a number or not. Also, since there's only one fixed-size column to look up in the index, it's safe to assume surrogates have a faster lookup time by primary key as well.
I just wanted to add something here that I don't ever see covered when discussing auto-generated integer identity fields with relational databases (because I see them a lot), and that is, it's base type can an will overflow at some point.
Now I'm not trying to say this automatically makes composite ids the way to go, but it's just a matter of fact that even though more data could be logically added to a table (which is still unique), the single auto-generated integer identity could prevent this from happening.
Yes I realize that for most situations it's unlikely, and using a 64bit integer gives you lots of headroom, and realistically the database probably should have been designed differently if an overflow like this ever occurred.
But that doesn't prevent someone from doing it... a table using a single auto-generated 32bit integer as it's identity, which is expected to store all transactions at a global level for a particular fast-food company, is going fail as soon as it tries to insert it's 2,147,483,648th transaction (and that is a completely feasible scenario).
It's just something to note, that people tend to gloss over or just ignore entirely. If any table is going to be inserted into with regularity, considerations should be made to just how often and how much data will accumulate over time, and whether or not an integer based identifier should even be used.