After going thru some tutorials on SQL Server 2008's new feature "SPARSE COLUMN", I have found that it doesn't take any space if the column value is 0 or NULL but when there is a value, it takes 4 times the space a regular(non sparse) column holds.
If my understanding is correct, then why I will go for that at the time of database design?
And if I use that, then at what situation will I be?
Also out of curiosity, how does no space get reserved when a column is defined as sparse column (I mean to say, what is the internal implementation for that?)
A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)
So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.
A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.
So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.
This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.
So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:
- A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
- A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
- A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
You're reading it wrong - it never takes 4x the space.
Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.
"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."
| datetime NULL | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| NULL (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| 20171213 (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
| NULL (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
You lose 4 bytes not just once per row; but for every cell in the row that is not null.
From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2 by Pinal Dave:
All SPARSE columns are stored as one XML column in database. Let us
see some of the advantage and disadvantage of SPARSE column.
Advantages of SPARSE column are:
INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.
SPARSE column can take advantage of filtered Indexes, where data are filled in the row.
SPARSE column saves lots of database space when there are zero or null values in database.
Disadvantages of SPARSE column are:
SPARSE column does not have IDENTITY or ROWGUIDCOL property.
SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.
SPARSE column can not have default value or rule or computed column.
Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.
Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE
column takes performance hit over regular column.