I have found umpteen posts which begin like Quite a lot of time I have come across people saying "Clustered Index Physically sorts the data inside the table based on the Clustered Index Keys". It's not true! Then such posts go on to describe how it is actually stored, via linked lists or whatever. For example, this post says that
Each Index row contains a Key value and a pointer to either an
Intermediate level page in the B-tree, or a Data row in the Leaf level
of the Index. The Pages in each level of the Index are linked in a
Doubly-linked list. The Pages in the Data chain and the rows in them
are ordered on the value of the Clustered Index key.
That brings me to my question, the data pages are the place where the table data are stored, right? So if they are sorted and the data within them also are sorted according to the indexed column value, why is it wrong to say that the clustered index keeps the table data in sorted order? Here is a pic from Kalen Delaney's book, which shows that the leaf pages in a table with CI are all sorted according to the CI value:
You're right.
Clustered indexes do not physically sort the data inside the table based on the Clustered Index Keys. If that was the case then the inserts into the middle of a large table with no free space would require huge amounts of IO to make room for the new record.
Instead a new page is allocated from anywhere in the file and linked into the linked list.
The degree to which the physical order of pages differs from the logical order is the extent of logical fragmentation. Rebuilding or reorganizing the index can reduce this.
When you create an index, there is also an index table is created(I think its called Index allocation map (IAM), not so sure about the name)
In case of clustered index, the index table contains the index column, and pointer to the actual records.
So When a table has a clustered index, data may not be physically sorted on the table..
The data in the disk will be maintained as a linked list and clustered index is a pointer to that data.
Now the index table will be sorted physically... not the actual table..and the index table is maintained as a B-Tree, so that searching would be faster.
Now when you create a non-clustered index, it will point to the clustered index table
Edit: (as marc_s pointed out) Leaf node of clustered index actually contains data, where as in non clustered index contains pointers..
But still I don't believe, it will reorder the data in the disk, it will just reorder the pointers
Clustered indexes order the table data by the columns of the index. Each new row will be positioned in the right spot of the table when inserted or updated.
This doesn't happen with nonclustered indexes.
Below blog post clearly explains how clustered indexes are stored.
Does a Clustered Index really physically store the rows in key order?
My original statement here is WRONG
Because any index does NOT affect the data in the table at all. Clustered index is just another type of index pointing to data in the table. It does not change the order or does anything else to the data.
You can always fetch data directly from the table with row numbers before and after you create (clustered or unclustered) index.
End of Original statement
Correction is required (I don't use MSSQL very often, so never had a chance to test this before)
It seems that MSSQL implements clustered index as not really an index at all, but probably closer to trigger/constraint pair.
From my crude test right now:
1)
CREATE TABLE testTable ...
INSERT ... (few rows)
SELECT * FROM testTable
This shows ALL results in insertion order
2)
CREATE CLUSTERED INDEX ... ON testTable (...);
INSERT ... (few rows)
SELECT * FROM testTable
This shows ALL results ordered by fields in CLUSTERED INDEX
3)
DROP INDEX (CLUSTERED INDEX Name) ON testTable;
INSERT ... (few rows)
SELECT * FROM testTable
This shows ALL results from step 2) [before DROP INDEX
] in the same order and rows inserted later [in step 3)] in insertion order again.
To me it means that MSSQL DOES re-order the actual data records (most likely at great cost on insert/delete).
So, I stand corrected and rebuked. In all honesty I never expected this (CLUSTERED INDEX behaviour, not me being proved wrong) to be the case.