What's the best practice for primary keys in t

2018-12-31 18:36发布

When designing tables, I've developed a habit of having one column that is unique and that I make the primary key. This is achieved in three ways depending on requirements:

  1. Identity integer column that auto increments.
  2. Unique identifier (GUID)
  3. A short character(x) or integer (or other relatively small numeric type) column that can serve as a row identifier column

Number 3 would be used for fairly small lookup, mostly read tables that might have a unique static length string code, or a numeric value such as a year or other number.

For the most part, all other tables will either have an auto-incrementing integer or unique identifier primary key.

The Question :-)

I have recently started working with databases that have no consistent row identifier and primary keys are currently clustered across various columns. Some examples:

  • datetime/character
  • datetime/integer
  • datetime/varchar
  • char/nvarchar/nvarchar

Is there a valid case for this? I would have always defined an identity or unique identifier column for these cases.

In addition there are many tables without primary keys at all. What are the valid reasons, if any, for this?

I'm trying to understand why tables were designed as they were, and it appears to be a big mess to me, but maybe there were good reasons for it.

A third question to sort of help me decipher the answers: In cases where multiple columns are used to comprise the compound primary key, is there a specific advantage to this method vs. a surrogate/artificial key? I'm thinking mostly in regards to performance, maintenance, administration, etc.?

21条回答
余生无你
2楼-- · 2018-12-31 18:49

Besides all those good answers, I just want to share a good article I just read, The great primary-key debate.

Just to quote a few points:

The developer must apply a few rules when choosing a primary key for each table:

  • The primary key must uniquely identify each record.
  • A record’s primary-key value can’t be null.
  • The primary key-value must exist when the record is created.
  • The primary key must remain stable—you can’t change the primary-key field(s).
  • The primary key must be compact and contain the fewest possible attributes.
  • The primary-key value can’t be changed.

Natural keys (tend to) break the rules. Surrogate keys comply with the rules. (You better read through that article, it is worth your time!)

查看更多
深知你不懂我心
3楼-- · 2018-12-31 18:51

If you really want to read through all of the back and forth on this age-old debate, do a search for "natural key" on Stack Overflow. You should get back pages of results.

查看更多
其实,你不懂
4楼-- · 2018-12-31 18:53

I follow a few rules:

  1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
  2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
  3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "primary key" can change for real world situations.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

查看更多
刘海飞了
5楼-- · 2018-12-31 18:58

All tables should have a primary key. Otherwise, what you have is a HEAP - this, in some situations, might be what you want (heavy insert load when the data is then replicated via a service broker to another database or table for instance).

For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.

查看更多
长期被迫恋爱
6楼-- · 2018-12-31 18:58

I'll be up-front about my preference for natural keys - use them where possible, as they'll make your life of database administration a lot easier. I established a standard in our company that all tables have the following columns:

  • Row ID (GUID)
  • Creator (string; has a default of the current user's name (SUSER_SNAME() in T-SQL))
  • Created (DateTime)
  • Timestamp

Row ID has a unique key on it per table, and in any case is auto-generated per row (and permissions prevent anyone editing it), and is reasonably guaranteed to be unique across all tables and databases. If any ORM systems need a single ID key, this is the one to use.

Meanwhile, the actual PK is, if possible, a natural key. My internal rules are something like:

  • People - use surrogate key, e.g. INT. If it's internal, the Active Directory user GUID is an acceptable choice
  • Lookup tables (e.g. StatusCodes) - use a short CHAR code; it's easier to remember than INTs, and in many cases the paper forms and users will also use it for brevity (e.g. Status = "E" for "Expired", "A" for "Approved", "NADIS" for "No Asbestos Detected In Sample")
  • Linking tables - combination of FKs (e.g. EventId, AttendeeId)

So ideally you end up with a natural, human-readable and memorable PK, and an ORM-friendly one-ID-per-table GUID.

Caveat: the databases I maintain tend to the 100,000s of records rather than millions or billions, so if you have experience of larger systems which contraindicates my advice, feel free to ignore me!

查看更多
美炸的是我
7楼-- · 2018-12-31 18:59

Here are my own rule of thumbs I have settled on after 25+ years of development experience.

  • All tables should have a single column primary key that auto increments.
  • Include it in any view that is meant to be updateable
  • The primary key should not have any meaning in the context of your application. This means that it should not be a SKU, or an account number or an employee id or any other information that is meaningful to your application. It is merely a unique key associated with an entity.

The primary key is used by the database for optimization purposes and should not be used by your application for anything more than identifying a particular entity or relating to a particular entity.

Always having a single value primary key makes performing UPSERTs very straightforward.

Use additional indices to support multi-column keys which have meaning in your application.

查看更多
登录 后发表回答