Why & When should I use SPARSE COLUMN? (SQL SERVER

2020-01-27 02:40发布

After going thru some tutorials on SQL Server 2008's new feature "SPARSE COLUMN", I have found that it doesn't take any space if the column value is 0 or NULL but when there is a value, it takes 4 times the space a regular(non sparse) column holds.

If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation will I be?

Also out of curiosity, how does no space get reserved when a column is defined as sparse column (I mean to say, what is the internal implementation for that?)

5条回答
做个烂人
2楼-- · 2020-01-27 03:21
  • Storing a null in a sparse column takes up no space at all.

  • To any external application the column will behave the same

  • Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.

  • You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.

  • Change Data Capture and Transactional replication both work, but not the column sets feature.

Downsides

  • If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.

  • Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)

  • computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)

  • You can't apply rules or have default values.

  • Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).

  • Merge replication doesn't work.

  • Data compression doesn't work.

  • Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.

Reference

查看更多
戒情不戒烟
3楼-- · 2020-01-27 03:37

A sparse column doesn't use 4x the amount of space to store a value, it uses a (fixed) 4 extra bytes per non-null value. (As you've already stated, a NULL takes 0 space.)

  • So a non-null value stored in a bit column would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.

  • A non-null value stored in a GUID (UniqueIdentifier) column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

So the "expected savings" depends strongly on what kind of column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.

This Books Online Page has a table showing what percentage of different data types would need to be null for you to end up with a benefit.

So when should you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
查看更多
放荡不羁爱自由
4楼-- · 2020-01-27 03:39
| datetime NULL      | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| NULL     (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| 20171213 (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |
| NULL     (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |

You lose 4 bytes not just once per row; but for every cell in the row that is not null.

查看更多
我欲成王,谁敢阻挡
5楼-- · 2020-01-27 03:39

From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2 by Pinal Dave:

All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.

Advantages of SPARSE column are:

  • INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.

  • SPARSE column can take advantage of filtered Indexes, where data are filled in the row.

  • SPARSE column saves lots of database space when there are zero or null values in database.

Disadvantages of SPARSE column are:

  • SPARSE column does not have IDENTITY or ROWGUIDCOL property.

  • SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.

  • SPARSE column can not have default value or rule or computed column.

  • Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.

  • Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.

查看更多
疯言疯语
6楼-- · 2020-01-27 03:41

You're reading it wrong - it never takes 4x the space.

Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.

"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."

查看更多
登录 后发表回答