I have recently started a new job and noticed that all the SQL tables use the GUID data type for the primary key.
In my previous job we used integers (Auto-Increment) for the primary key and it was a lot more easier to work with in my opinion.
For example, say you had two related tables; Product and ProductType - I could easily cross check the 'ProductTypeID' column of both tables for a particular row to quickly map the data in my head because its easy to store the number (2,4,45 etc) as opposed to (E75B92A3-3299-4407-A913-C5CA196B3CAB).
The extra frustration comes from me wanting to understand how the tables are related, sadly there is no Database diagram :(
A lot of people say that GUID's are better because you can define the unique identifer in your C# code for example using NewID() without requiring SQL SERVER to do it - this also allows you to know provisionally what the ID will be.... but I've seen that it is possible to still retrieve the 'next auto-incremented integer' too.
A DBA contractor reported that our queries could be up to 30% faster if we used the Integer type instead of GUIDS...
Why does the GUID data type exist, what advantages does it really provide?... Even if its a choice by some professional there must be some good reasons as to why its implemented?
They basically save you from more sometimes complicated logic of using
There are a ton of Google-able articles on using GUIDs as PKs and almost all of them say the same thing your DBA contractor says -- queries are faster without GUIDs as keys.
The primary use I've seen in practice (we've never used them as PKs) is with replication. The MSDN page for uniqueidentifier says about the same.
GUIDs are good as identity fields in certain cases:
GUIDs are generated to be globally unique, which is why they are suited for such scenarios.
Contrary to what most folks here seem to preach, I see GUID's as more of a plague than a blessing. Here's why:
GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
You really need to keep two issues apart:
the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.
By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.
As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know - there's
newsequentialid()
in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. Plus, you can only use it as a default for a column in your table - you cannot get a new sequential GUID in T-SQL code (like a trigger or something) - another major drawback.Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
Quick calculation - using INT vs. GUID as Primary and Clustering Key:
TOTAL: 25 MB vs. 106 MB - and that's just on a single table!
Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.
Marc
INT
Advantage:
Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions. Numeric values are easier to understand for application users if they are displayed.
Disadvantage:
If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.
GUID
Advantage:
Unique across the server.
Disadvantage:
String values are not as optimal as integer values for performance when used in joins, indexes and conditions. More storage space is required than INT.
credit goes to : http://blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/
It is globally unique, so that each record in your table has a GUID that is shared by no other item of any kind in the world. Handy if you need this kind of exclusive identification (if you are replicating the database, or combining data from multiple source). Otherwise, your dba is correct - GUIDs are much larger and less efficient that integers, and you could speed up your db (30%? maybe...)