What's the point of a candidate key?

2020-02-05 03:06发布

问题:

I'm fairly new to database management and this question never seems to be answered in more than one sentence. All other SO answers say "A candidate key is a minimal super key." That means nothing to me.

A candidate key is supposed to specify uniqueness of a db record, correct? And a primary key is a candidate key. If a primary key already specifies uniqueness, what's point of adding more candidate keys?

I have seen example records like the following:

Employee(ID, Name, PhoneNumber)

where ID is the primary key and PhoneNumber is a candidate key. From what I see, the ID is enough to specify the uniqueness of an employee record. Although PhoneNumbers are (probably) unique, specifying them as a candidate key does not seem "minimal" to me.

回答1:

It means that if PhoneNumber was indeed a candidate key you could delete the ID column and use PhoneNumber instead. In other words, it is a candidate for being a unique key.

Wikipedia has a more formal definition that you many want to look at.



回答2:

A key is called a candidate key, because while it could be used as a PK, it is not necessarily the PK.

There can be more than one candidate key for a given row, e.g., EmployeeID and SSN.

Often, rather than using a candidate key as the PK, a surrogate key is created instead. This is because decisions around what candidate key to use can be found to be erroneous later, which can cause a huge headache (literally).

Another reason is that a surrogate key can be created using an efficient data type for indexing purposes, which the candidate keys may not have (e.g., a UserImage).

A third reason is that many ORMs work only with a single-column PK, so candidate keys composed of more than one column (composite keys) are ruled out in that case.

Something that many developers do not realize is that selecting a surrogate key over a natural key may be a compromise in terms of data integrity. You may be losing some constraints on your data by selecting a surrogate key, and often a trigger is required to simulate the constraint if a surrogate key is chosen.



回答3:

Candidate key is a concept that appears when you are designing a database system.

Suppose your system will have a table named User, defined as below:

User (fullName, socialSecurityNumber, creditCardNumber, age).

Well, you have to choose which subset of these columns will be your primary key. The goal when designing the database is, of course, to keep this set minimal. You wouldn't use the pair (SSN, creditCardNumber) if the SSN alone already guarantees uniqueness.

Now, suppose that fullName, SSN and creditCardNumber are all fields that you know, somehow, that are unique for all users. You could use any of those as your PK, so they are all candidate keys (whereas age, on the other hand, is not).
Which will you choose? That will depend on factors such as the datatype of the field (it's preferable to set an index on an integer column rather than on an varchar column, for example).



回答4:

Candidate key means exactly the same thing as primary key. All candidate keys serve the same purpose: to ensure data integrity by preventing duplicate data. It's perfectly sensible for a table to have more than one candidate key enforced in it if that's necessary to ensure some data is not duplicated.

Note that minimal in this context does NOT mean smallest. It means irreducible. i.e. no attribute can be removed from the key while still maintaining its uniqueness.



回答5:

In the RM (relational model): A superkey is a unique column set. A CK (candidate key) is a superkey containing no smaller superkey. It is a candidate for being a PK (primary key). A PK is just some CK you decided to call the PK. The other CKs are AKs (alternate keys).

A SQL PK is a UNIQUE NOT NULL, not necessarily a CK/PK. So we can say an SQL CK is UNIQUE NOT NULL, a candidate for being declared as SQL PK, and we can call the others SQL AKs. In an SQL context you have to ask people what they mean when they use these terms with different meanings in the RM & SQL.



回答6:

A "candidate key" is a candidate in exactly the same sense that candidates for President are candidates: They are the choices presented to you, from which you must choose one. They are candidates for the primary key. Pick one. In practice, not all choices are necessarily equally good.



回答7:

Candidate keys usually refer to those columns which could potentially be selected as the natural primary key. However, natural primary keys are also often a bad idea because they are unique but not unchanging (think of the havoc of changing millions of child rows because a company name changed) or because they are less efficient in joins than surrogate keys. Further, in real life, many potential candidate keys are not nearly stable enough for a true PK and are not as unique as we think. Emails, for instance, can be reused after an account is closed.

You add other indexes to specify uniqueness in order to maintain data integrity. For instance in your example, you are using a surrogate key to ensure uniqueness of a record. But this does not ensure that the person/phone combination is entered only once. So you would want to create a unique index for any possible candidate key (and this can consist of one or more fields) in the data if you are using a surrogate key. This ensures that the items which need to be unique are and allows you to get the performance benefits of a surrogate key.

Further sometimes there are mulitple fields or combinations of fields which should be unique. For instance suppose you have a table storing organization structure. Each organizational entity should be unique, so should each organization/person combination (assuming that there is no job sharing). By placing unique indexes on these fields, you are able to prevent bad data from being inserted into the table.



回答8:

We use candidate key, because some times it is necessary to take care of the record and other issue, for example a data base for bank system, where Account_No is primary key and SocialSecurity_NO will be candidate key, we keep the SocialSecurity_NO as UNIQUE key, because if government got a mistake in providing the SocialSecurity_NO, then it will be big problem, so we have already declared it as UNIQUE key, so there is no possibility that two user with same SocialSecurity_NO will be able for account...

primary key----------- candidate key---------attribute3-------attribute4

Account_No SocialSecurity_NO



回答9:

In nutshell: CANDIDATE KEY is a minimal SUPER KEY.

Where Super key is the combination of columns(or attributes) that uniquely identify any record(or tuple) in a relation(table) in RDBMS.


For instance, consider the following dependencies in a table having columns A, B, C and D (Giving this table just for a quick example so not covering all dependencies that R could have).

Attribute set (Determinant)---Can Identify--->(Dependent)

A-----> AD

B-----> ABCD

C-----> CD

AC----->ACD

AB----->ABCD

ABC----->ABCD

BCD----->ABCD


Now, B, AB, ABC, BCD identifies all columns so those four qualify for the super key.

But, B⊂AB; B⊂ABC; B⊂BCD hence AB, ABC, and BCD disqualified for CANDIDATE KEY as their subsets could identify the relation, so they aren't minimal and hence only B is the candidate key, not the others.

Thanks for asking.