When designing tables, I've developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements:
- Identity integer column that auto increments.
- Unique identifier (GUID)
- A short character(x) or integer (or other relatively small numeric type) column that can serve as a row identifier column
Number 3 would be used for fairly small lookup, mostly read tables that might have a unique static length string code, or a numeric value such as a year or other number.
For the most part, all other tables will either have an auto-incrementing integer or unique identifier primary key.
The Question :-)
I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns. Some examples:
- datetime/character
- datetime/integer
- datetime/varchar
- char/nvarchar/nvarchar
Is there a valid case for this? I would have always defined an identity or unique identifier column for these cases.
In addition there are many tables without primary keys at all. What are the valid reasons, if any, for this?
I'm trying to understand why tables were designed as they were, and it appears to be a big mess to me, but maybe there were good reasons for it.
A third question to sort of help me decipher the answers: In cases where multiple columns are used to comprise the compound primary key, is there a specific advantage to this method vs. a surrogate/artificial key? I'm thinking mostly in regards to performance, maintenance, administration, etc.?
Besides all those good answers, I just want to share a good article I just read, The great primary-key debate.
Just to quote a few points:
The developer must apply a few rules when choosing a primary key for each table:
Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)
If you really want to read through all of the back and forth on this age-old debate, do a search for "natural key" on Stack Overflow. You should get back pages of results.
I follow a few rules:
On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.
All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance).
For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.
I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:
SUSER_SNAME()
in T-SQL))Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use.
Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:
EventId, AttendeeId
)So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID.
Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me!
Here are my own rule of thumbs I have settled on after 25+ years of development experience.
The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.
Always having a single value primary key makes performing UPSERTs very straightforward.
Use additional indices to support multi-column keys which have meaning in your application.