可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to create a table in MS SQL Server 2005 to record details of certain system operations. As you can see from the table design below, every column apart from Details is is non nullable.

CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[Details] [varchar](max) NULL
)

Because the Details column won't always have data in it. Is it more efficient to store this column in a separate table and provide a link to it instead?

CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[DetailID] [int] NULL
)       

CREATE TABLE [Detail]
(
[DetailID] [int] IDENTITY(1,1) NOT NULL,
[Details] [varchar](max) NOT NULL
)

For a smaller data type I wouldn't really consider it, but for a varchar(max) does doing this help keep the table size smaller? Or I am just trying to out smart the database and achieving nothing?

回答1:

Keep it inline. Under the covers SQL Server already stores the MAX columns in a separate 'allocation unit' since SQL 2005. See Table and Index Organization. This in effect is exactly the same as keeping the MAX column in its own table, but w/o any disadvantage of explicitly doing so.

Having an explicit table would actually be both slower (because of the foreign key constraint) and consume more space (because of the DetaiID duplication). Not to mention that it requires more code, and bugs are introduced by... writing code.

alt text http://i.msdn.microsoft.com/ms189051.3be61595-d405-4b30-9794-755842d7db7e(en-us,SQL.100).gif

Update

To check the actual location of data, a simple test can show it:

use tempdb;
go

create table a (
  id int identity(1,1) not null primary key,
  v_a varchar(8000),
  nv_a nvarchar(4000),
  m_a varchar(max),
  nm_a nvarchar(max),
  t text,
  nt ntext);
go

insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('v_a', N'nv_a', 'm_a', N'nm_a', 't', N'nt');
go

select %%physloc%%,* from a
go

The %%physloc%% pseudo column will show the actual physical location of the row, in my case it was page 200:

dbcc traceon(3604)
dbcc page(2,1, 200, 3)

Slot 0 Column 2 Offset 0x19 Length 3 Length (physical) 3
v_a = v_a                            
Slot 0 Column 3 Offset 0x1c Length 8 Length (physical) 8
nv_a = nv_a                          
m_a = [BLOB Inline Data] Slot 0 Column 4 Offset 0x24 Length 3 Length (physical) 3
m_a = 0x6d5f61                       
nm_a = [BLOB Inline Data] Slot 0 Column 5 Offset 0x27 Length 8 Length (physical) 8
nm_a = 0x6e006d005f006100            
t = [Textpointer] Slot 0 Column 6 Offset 0x2f Length 16 Length (physical) 16
TextTimeStamp = 131137536            RowId = (1:182:0)                    
nt = [Textpointer] Slot 0 Column 7 Offset 0x3f Length 16 Length (physical) 16
TextTimeStamp = 131203072            RowId = (1:182:1)

All column values but the TEXT and NTEXT were stored inline, including the MAX types.
After changing the table options and insert a new row (sp_tableoption does not affect existing rows), the MAX types were evicted into their own storage:

sp_tableoption 'a' , 'large value types out of row', '1';
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('2v_a', N'2nv_a', '2m_a', N'2nm_a', '2t', N'2nt');    
dbcc page(2,1, 200, 3);

Note how m_a and nm_a columns are now a Textpointer into the LOB allocation unit:

Slot 1 Column 2 Offset 0x19 Length 4 Length (physical) 4
v_a = 2v_a                           
Slot 1 Column 3 Offset 0x1d Length 10 Length (physical) 10
nv_a = 2nv_a                         
m_a = [Textpointer] Slot 1 Column 4 Offset 0x27 Length 16 Length (physical) 16
TextTimeStamp = 131268608            RowId = (1:182:2)                    
nm_a = [Textpointer] Slot 1 Column 5 Offset 0x37 Length 16 Length (physical) 16
TextTimeStamp = 131334144            RowId = (1:182:3)                    
t = [Textpointer] Slot 1 Column 6 Offset 0x47 Length 16 Length (physical) 16
TextTimeStamp = 131399680            RowId = (1:182:4)                    
nt = [Textpointer] Slot 1 Column 7 Offset 0x57 Length 16 Length (physical) 16
TextTimeStamp = 131465216            RowId = (1:182:5)

For completion sakeness we can also force the one of the non-max fields out of row:

update a set v_a = replicate('X', 8000);
dbcc page(2,1, 200, 3);

Note how the v_a column is stored in the Row-Overflow storage:

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
v_a = [BLOB Inline Root] Slot 0 Column 2 Offset 0x19 Length 24 Length (physical) 24
Level = 0                            Unused = 99                          UpdateSeq = 1
TimeStamp = 1098383360               
Link 0
Size = 8000                          RowId = (1:176:0)

So, as other have already commented, the MAX types are stored inline by default, if they fit. For many DW projects this would be unnacceptable because the typical DW loads must scan or at least range scan, so the sp_tableoption ..., 'large value types out of row', '1' should be used. Note that this does not affect existing rows, in my test not even on index rebuild, so the option has to be turned on early.

For most OLTP type loads though the fact that MAX types are stored inline if possible is actually an advantage, since the OLTP access pattern is to seek and the row width makes little impact on it.

None the less, regarding the original question: separate table is not necessary. Turning on the large value types out of row option achieves the same result at a free cost for development/test.

回答2:

Paradoxically, if your data is normally less than 8000 characters, I would store it in a separate table, while if the data is greater than 8000 characters, I would keep it in the same table.

This is because what happens is that SQL Server keeps the data in the page if it allows the row to sit in single page, but when the data gets larger, it moves it out just like the TEXT data type and leaves just a pointer in the row. So for a bunch of 3000 character rows, you are fitting less rows per page, which is really inefficient, but for a bunch of 12000 character rows, the data is out of the row, so it's actually more efficient.

Having said this, typically you have a wide ranging mix of lengths and thus I would move it into its own table. This gives you flexibility for moving this table to a different file group etc.

Note that you can also specify it to force the data out of the row using the sp_tableoption. varchar(max) is basically similar to the TEXT data type with it defaulting to data in row (for varchar(max)) instead of defaulting to data out of row (for TEXT).

回答3:

You should structure your data into whatever seems the most logical structure and allow SQL Server to perform its optimizations as to how to physically store the data.

If you find, through performance analysis, that your structure is a performance problem, then consider performing changes to your structure or to storage settings.

回答4:

Keep it inline. The whole point of varchar is that it takes up 0 bytes if it's empty, 4 bytes for 'Hello', and so on.

回答5:

I would normalize it by creating the Detail table. I assume some of the entries in Log will have the same Detail? So if you normalize it you will only be storing an FK id INTEGER instead of the text for every occurrence if you stored the text on the Detail table. If you have reasons to de-normalize do it, but from your question I don't see that being the case.

回答6:

Having a nullable column costs 2 bytes for every 16 of them. If this is the only (or 17th, or 33nd, etc) nullable column in the table, it will cost you 2 bytes per row, otherwise nothing.