Surrogate vs. natural/business keys [closed]

2018-12-31 05:43发布

Here we go again, the old argument still arises...

Would we better have a business key as a primary key, or would we rather have a surrogate id (i.e. an SQL Server identity) with a unique constraint on the business key field?

Please, provide examples or proof to support your theory.

19条回答
后来的你喜欢了谁
2楼-- · 2018-12-31 06:07

Using a surrogate key is better in my opinion as there is zero chance of it changing. Almost anything I can think of which you might use as a natural key could change (disclaimer: not always true, but commonly).

An example might be a DB of cars - on first glance, you might think that the licence plate could be used as the key. But these could be changed so that'd be a bad idea. You wouldnt really want to find that out after releasing the app, when someone comes to you wanting to know why they can't change their number plate to their shiny new personalised one.

查看更多
萌妹纸的霸气范
3楼-- · 2018-12-31 06:09

It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...

A disadvantage of surrogate keys is that they are meaningless (cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:

select sum(t.hours)
from timesheets t
where t.dept_code = 'HR'
and t.status = 'VALID'
and t.project_code = 'MYPROJECT'
and t.task = 'BUILD';

against:

select sum(t.hours)
from timesheets t
     join departents d on d.dept_id = t.dept_id
     join timesheet_statuses s on s.status_id = t.status_id
     join projects p on p.project_id = t.project_id
     join tasks k on k.task_id = t.task_id
where d.dept_code = 'HR'
and s.status = 'VALID'
and p.project_code = 'MYPROJECT'
and k.task_code = 'BUILD';

Unless anyone seriously thinks the following is a good idea?:

select sum(t.hours)
from timesheets t
where t.dept_id = 34394
and t.status_id = 89    
and t.project_id = 1253
and t.task_id = 77;

"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you need to change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.

See also: my answer to another question

查看更多
何处买醉
4楼-- · 2018-12-31 06:10

In the case of point in time database it is best to have combination of surrogate and natural keys. e.g. you need to track a member information for a club. Some attributes of a member never change. e.g Date of Birth but name can change. So create a Member table with a member_id surrogate key and have a column for DOB. Create another table called person name and have columns for member_id, member_fname, member_lname, date_updated. In this table the natural key would be member_id + date_updated.

查看更多
无与为乐者.
5楼-- · 2018-12-31 06:11

Surrogate keys are quite handy if you plan to use an ORM tool to handle/generate your data classes. While you can use composite keys with some of the more advanced mappers (read: hibernate), it adds some complexity to your code.

(Of course, database purists will argue that even the notion of a surrogate key is an abomination.)

I'm a fan of using uids for surrogate keys when suitable. The major win with them is that you know the key in advance e.g. you can create an instance of a class with the ID already set and guaranteed to be unique whereas with, say, an integer key you'll need to default to 0 or -1 and update to an appropriate value when you save/update.

UIDs have penalties in terms of lookup and join speed though so it depends on the application in question as to whether they're desirable.

查看更多
低头抚发
6楼-- · 2018-12-31 06:13

As a reminder it is not good practice to place clustered indices on random surrogate keys i.e. GUIDs that read XY8D7-DFD8S, as they SQL Server has no ability to physically sort these data. You should instead place unique indices on these data, though it may be also beneficial to simply run SQL profiler for the main table operations and then place those data into the Database Engine Tuning Advisor.

See thread @ http://social.msdn.microsoft.com/Forums/en-us/sqlgetstarted/thread/27bd9c77-ec31-44f1-ab7f-bd2cb13129be

查看更多
唯独是你
7楼-- · 2018-12-31 06:17

On a datawarehouse scenario I believe is better to follow the surrogate key path. Two reasons:

  • You are independent of the source system, and changes there --such as a data type change-- won't affect you.
  • Your DW will need less physical space since you will use only integer data types for your surrogate keys. Also your indexes will work better.
查看更多
登录 后发表回答