I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered
and Non clustered indexes
.
I googled and what I found was :
A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.
What I found in SO was What are the differences between a clustered and a non-clustered index?.
Can someone explain this in plain English?
I realize this is a very old question, but I thought I would offer an analogy to help illustrate the fine answers above.
CLUSTERED INDEX
If you walk into a public library, you will find that the books are all arranged in a particular order (most likely the Dewey Decimal System, or DDS). This corresponds to the "clustered index" of the books. If the DDS# for the book you want was
005.7565 F736s
, you would start by locating the row of bookshelves that is labeled001-099
or something like that. (This endcap sign at the end of the stack corresponds to an "intermediate node" in the index.) Eventually you would drill down to the specific shelf labelled005.7450 - 005.7600
, then you would scan until you found the book with the specified DDS#, and at that point you have found your book.NON-CLUSTERED INDEX
But if you didn't come into the library with the DDS# of your book memorized, then you would need a second index to assist you. In the olden days you would find at the front of the library a wonderful bureau of drawers known as the "Card Catalog". In it were thousands of 3x5 cards -- one for each book, sorted in alphabetical order (by title, perhaps). This corresponds to the "non-clustered index". These card catalogs were organized in a hierarchical structure, so that each drawer would be labeled with the range of cards it contained (
Ka - Kl
, for example; i.e., the "intermediate node"). Once again, you would drill in until you found your book, but in this case, once you have found it (i.e, the "leaf node"), you don't have the book itself, but just a card with an index number (the DDS#) with which you could find the actual book in the clustered index.Of course, nothing would stop the librarian from photocopying all the cards and sorting them in a different order in a separate card catalog. (Typically there were at least two such catalogs: one sorted by author name, and one by title.) In principle, you could have as many of these "non-clustered" indexes as you want.
With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.
With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indexes, although each new index will increase the time it takes to write new records.
It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.
Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.
In SQL Server row oriented storage both clustered and nonclustered indexes are organized as B trees.
(Image Source)
The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table. This has two implications.
Non clustered indexes can also do point 1 by using the
INCLUDE
clause (Since SQL Server 2005) to explicitly include all non key columns but they are secondary representations and there is always another copy of the data around (the table itself).The two indexes above will be nearly identical. With the upper level index pages containing values for the key columns
A,B
and the leaf level pages containingA,B,C,D
The above quote from SQL Server books online causes much confusion
In my opinion it would be much better phrased as.
The books online quote is not incorrect but you should be clear that the "sorting" of both non clustered and clustered indices is logical not physical. If you read the pages at leaf level by following the linked list and read the rows on the page in slot array order then you will read the index rows in sorted order but physically the pages may not be sorted. The commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index key is false.
This would be an absurd implementation. For example if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row .
Instead a page split occurs. Each page at the leaf level of both clustered and non clustered indexes has the address (
File:Page
) of the next and previous page in logical key order. These pages need not be either contiguous or in key order.e.g. the linked page chain might be
1:2000 <-> 1:157 <-> 1:7053
When a page split happens a new page is allocated from anywhere in the filegroup (from either a mixed extent, for small tables, or a non empty uniform extent belonging to that object or a newly allocated uniform extent). This might not even be in the same file if the file group contains more than one.
The degree to which the logical order and contiguity differs from the idealised physical version is the degree of logical fragmentation.
In a newly created database with a single file I ran the following.
Then checked the page layout with
Results were all over the place. The first row in key order (with value 1 - highlighted with arrow below) was on nearly the last physical page.
Fragmentation can be reduced or removed by rebuilding or reorganising an index to increase the correlation between logical order and physical order.
After running
I got the following
If the table has no clustered index it is called a heap.
Non clustered indexes can be built on either a heap or a clustered index. They always contain a row locator back to the base table. In the case of a heap this is a physical row identifier (rid) and consists of three components (File:Page:Slot). In the case of a Clustered index the row locator is logical (the clustered index key).
For the latter case if the non clustered index already naturally includes the CI key column(s) either as NCI key columns or
INCLUDE
-d columns then nothing is added. Otherwise the missing CI key column(s) silently get added in to the NCI.SQL Server always ensures that the key columns are unique for both types of index. The mechanism in which this is enforced for indexes not declared as unique differs between the two index types however.
Clustered indexes get a
uniquifier
added for any rows with key values that duplicate an existing row. This is just an ascending integer.For non clustered indexes not declared as unique SQL Server silently adds the row locator in to the non clustered index key. This applies to all rows, not just those that are actually duplicates.
The clustered vs non clustered nomenclature is also used for column store indexes. The paper Enhancements to SQL Server Column Stores states
Clustered Index
A clustered index determine the physical order of DATA in a table.For this reason a table have only 1 clustered index.
like "dictionary" No need of any other Index, its already Index according to words
Nonclustered Index
A non clustered index is analogous to an index in a Book.The data is store in one place. the index is store in another place and the index have pointers to the storage location of the data.For this reason a table have more than 1 Nonclustered index.
like "Chemistry book" at staring there is a separate index to point Chapter location and At the "END" there is another Index pointing the common WORDS location
Clustered Index: Primary Key constraint creates clustered Index automatically if no clustered Index already exists on the table. Actual data of clustered index can be stored at leaf level of Index.
Non Clustered Index: Actual data of non clustered index is not directly found at leaf node, instead it has to take an additional step to find because it has only values of row locators pointing towards actual data. Non clustered Index can't be sorted as clustered index. There can be multiple non clustered indexes per table, actually it depends on the sql server version we are using. Basically Sql server 2005 allows 249 Non Clustered Indexes and for above versions like 2008, 2016 it allows 999 Non Clustered Indexes per table.
Find below some characteristics of clustered and non-clustered indexes:
Clustered Indexes
create Index index_name(col1, col2, col.....)
.Non-clustered Indexes