I've been investigating the use of GUIDs as primary keys in databases. So far, the pros seem to outweigh the cons. However, I see one point where GUIDs may not be what I want.
In my application, users should be able to identify objects based on a user-friendly ID. So, for example, if they want to get a specific product without typing in the full name, they can use the product's ID. GUIDs aren't easy to remember for something like that.
The solution I've been thinking about is to use both a GUID and an auto-incrementing integer. The GUID would be the row's primary key, while the auto-incrementing integer would be an index used by the application's filtering functions. All SQL SELECT, UPDATE, DELETE statements would use the GUID, however.
The main reason I want to use GUIDs is to prevent clashes when merging two databases. If Database #1 and Database #2 both have a Product #2, the importer script would have to change the ID and all foreign keys referring to it. With GUIDs, I only have to change the user-friendly ID in the table itself, while foreign keys would use the GUID unique to each imported record and will therefore work without modification.
So, my question is: are there any major problems (besides the size of the GUID field and easy page fragmentation) with having an auto-incrementing integer index and a GUID primary key?
I always tend to use surrogate primary keys in my database.
That is: those primary keys have no actual meaning in the problem domain, and thus, those primary keys are never exposed to users.
(If this surrogate primary key is of type GUID or an identity, I don't care; this depends on the requirements).
If you say that users should be able to identify objects based on a user-friendly ID, then, I think that this user-friendly ID is a value that belongs to your 'problem domain'.
This means, that this ID should indeed be an attribute in your table, but it should not be used as the primary key in your table.
This also allows you to easily modify the value of such an user-friendly ID (if that should be necessary), without you having to worry about modifying related foreign keys as well.
"Why do "users should be able to identify objects based on a user-friendly ID" ?
In my opinion, your users should itentify records using codes.
Let's say your database contains products (as you mentionned it in Question). Wouldn't it be better if they had codes to represent products, that the users could enter.
Let's say you have tables and chairs, as a user, i would prefer using tbl and chr than 1 and 2 to identify what I am talking about.
In MySQL
, you'll need to set your numeric ID
as a PRIMARY KEY
, as AUTO_INCREMENT
may be only the PRIMARY KEY
, which means it should also be NOT NULL
.
You can still define a UNIQUE INDEX
on your GUID
column and use it anywhere, though an InnoDB
table will be clustered on the numeric id
, not on the GUID
.
There is a school of thought out there that says you should never expose your surrogate ID's to the outside world. So they'd say if you want a business ID, you should use something else for it.
This Wikipedia article, for example, says this:
Disassociation
The values of generated surrogate keys
- because they are generated and arbitrary - have no relationship to
the real-world meaning of the data
held in a row. When inspecting another
row holding a foreign key reference to
a surrogate key, it is not possible to
work out the meaning of it holding
that reference simply by looking at
the data in the row itself. A layer is
added to this indirection for each
foreign key join that one must
navigate while attempting to make
sense of a data item. This can also
make auditing more difficult, as
incorrect data is not obvious on
inspection.
Surrogate keys are also not natural
for data that is exported and shared.
A particular difficulty is that two
instances of a schema can hold records
which logically mean the same thing
(that is - they are the same in a
business sense), but which have a
different key owing to the history of
how the keys were assigned. An
approach to dealing with this is to
adopt the rule that surrogate keys are
never exported or imported: they are
never exposed outside the database
except as transient data (most
obviously, in executing applications
that have a "live" connection to the
database).
To be more specific about your question, yes there are other problems with using GUIDs as primary keys in databases:
http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
The problem is not so much with using a GUID as primary key, its using a non-sequential GUID as the clustered index for a table.
The takeaway here is to either use other fields as the clustered index, or use a sequential GUID to avoid this fragmentation.