SQL Server - Performance/Size Drawbacks of Null Co

2019-04-21 15:57发布

问题:

I'm working on a table design that could involve many NULL values in about 10 fields maybe 75% of the time the fields would be unused.

I just generated some fake data (a million records) and could not sense any impact on SQL Server 2005. Size difference was in the KB. Performance - no measurable difference after adding an index to the 3 non-nullable columns.

I know SQL Server 2008 has the sparse columns feature (which I assume is going to be used on the next SharePoint's UserData table). I want my code to work on 2005 though. But lots of NULL values exist in the design of the current SharePoint UserData table. So if its good enough for Microsoft...

Any good articles, links, white papers on the drawbacks or pain points around many NULL values in SQL Server table? Anyone have any experience on what happens as you scale to 10 mil or 100 mil records?

回答1:

I have never had a problem with the performance on multiple null columns, even on databases in the 100s of gigs size. I imagine you can end up with issues if you are running indexes on these fields and then using null in the query, but I have not seen this as a problem personally. Then again, I have not created database tables where every field except 3 was nullable.

On the other hand, I see an architecture problem when most of the data is null. the general reason is either a) an improperly normalized database or b) an attempt to allow users to stage data in the end table rather than creating separate tables to "build" data prior to committing to the database.

It is up to you to determine the best architecture of your database.



回答2:

What I do in this situation, which is very common, is to split the data up into two tables:

  • Required Data
  • Optional Data

For example, I'm currently writing a community website and one of the tables will obviously be a user table. I am recording a large amount of information about users and so I have split the data I collect into two tables:

  • Users
  • UserDetails

The Users table contains basic information that I will need all the time such as Username, Name and Session Information.

The UserDetails table contain extra information which I don't need as often such as Profile Page, Email Address, Password, Website Address, Date of Birth and so on.

This is known as vertical partitioning.



回答3:

The problems I've had in the past deal with the programing implications of having NULL values. For example issues with the clients, or issues with not in queries returning data when not expected because a null value was in there.



回答4:

Well, NULL is always a bit of an oddball in databases. I don't think it has too much of a performance impact in your case - but of course, you'll have to deal with all the NULL values separately.

Whenever possible, I strive to use a default value instead, so if you have e.g. some ID value of type INT, you could use 0 or -1 as a "no value present" indicator. That way, you can avoid having to do checks for value (field < 0) and check for NULL separately (field IS NULL or IS NOT NULL).

Marc



回答5:

The higher probability of NULL in column, the more close to end of record the column should be in table (to lat column in table).
The NULLS at the end of the row are not allocated any space, they are determined by NULL BITMAP linked to each record (it is 2 bytes, each bit of which tell about (non)NULL-ness of one of the column value in record).

Now, the NULL values are not read from column, they are read from NULL bitmaps. When NULL is detected the real value reading is skipped

The sparse feature should be used with cautions as it invokes overhead in time and space for non-null values For performance, you may to engage filtered indexing on non-null part of a column



回答6:

There's only one way to be sure. Go ahead and insert 100 million records then measure the end-to-end performance.



回答7:

Don't make a table with 75% unused columns. Make it with the columns your going to use all the time and look into using something like EAV for the other columns, or put them in a different table.