What does this sentence mean: Clustered indexes ar

2019-02-04 11:42发布

问题:

How are clustered indexes stored on a hard disk? What is the logical order?

How do non-clustered indexes work?

回答1:

This means that the data in the table are stored in a B-Tree according to the order of the CLUSTERED PRIMARY KEY (or the clustering columns).

This name is in my opinion a little bit confusing. The same concept in Oracle is called index-organized table which I find much more descriptive.

Non-clustered indexes contain the value of the indexed columns along with the pointer to the record they are originated from.

The "clustered index" is the table itself; the "non-clustered" index is an ordered copy of some of the table's columns.

If you "create" a clustered index, the table is rearranged. That's why you cannot have more than one "clustered index" on a table: the table cannot be arranged in more than one order.

If you create a secondary index, the shadow copy of the table is created, holding the values of the indexed columns and the pointers to the records they are from. Whenever the table changes, the copy is changed too (the engine takes care of that automatically).

Non-clustered table

id   col1   value
--   --     --
1    1      Data 1
6    1      Data 6
3    1      Data 3
7    2      Data 7
9    2      Data 9
5    2      Data 5

The table is not ordered.

Clustered table

id   col1   value
--   --     --
1    1      Data 1
3    1      Data 3
5    2      Data 5
6    1      Data 6
7    2      Data 7
9    2      Data 9

The table is ordered on id.

Clustered table with a secondary index

Table                      Index
id   col1   value          col1   id
--   --     --             --     --
1    1      Data 1         1      1
3    1      Data 3         1      3
5    2      Data 5         1      6
6    1      Data 6         2      5
7    2      Data 7         2      7
9    2      Data 9         2      9

The table is orderer on id, the index is ordered on (col1, id)



回答2:

For non-clustered indexes, a separate file is created, which hold just the index fields, which has it's records placed in the logical index order. For clustered index, there is no separate file -- the data from the table itself (all the fields) is placed in the logical order of the index.

This makes lookup on the index faster (though it's really best of indexes like dates where you'll be looking up a range). It also makes inserts rather slow if the record will be inserted in the middle.



回答3:

It means that a clustered index determined the physical order in which records in a table are actually stored. Non-clustered indices are simply lists of key values stored separately that enable fast lookups in other orderings than the clustered/physical ordering.

Quick example: a table with ID (primary key), FirstName, LastName and Car containing three people: 0=The Stig (Llana), 1=Jeremy Clarkson (DB9), 2=Richard Hammond (911), 3=James May (Lambo) and a clustered index on LastName and a non-clustered index on Car would store the actual lines of data in the table in this physical order on disk:

ID FirstName LastName Car
1  Jeremy    Clarkson DB9
2  Richard   Hammond  911
3  James     May      Lambo
0  The       Stig     Llana

The nonclustered index would also store something like:

Car   ID
911   2
DB9   1
Lambo 3
Llana 0


回答4:

Clustered Index Storage

Clustered indexes fundamentally work the exact same way that all other indexes work -- they're stored inside a variant of a struture called a B-Tree. They're stored in the same files, with the same formats as all of your other tables in SQL Server.

The Concept

Step back and think of the data that you're indexing. (I want you to think of a book in this analogy). What if, in addition to having indexes at the end of the book, you also ordered the data inside the book as well? You could look up information much faster. Take, for example, a phone book where all of the data is ordered by last name and first name. You don't have to go to the back of the phone book to find someone's number. Contrast that with a history book, where you have to go to the index at the back of the book to find what you want.

So logically, a clustered index (or "index-organized table" in Oracle) is your data, but sorted. Physically, the leaf nodes of the B-tree contain all of your table's data, in sorted order. This is really helpful when you are scanning the data in your table on a contiguous range, such as a date range.

Another important thing about clustered indexes (in SQL Server at least) is that your clustering columns (that is, the columns that make up how you sort your clustered index) are included at the end of each nonclustered index you define on your table. This makes searching for your clustering columns very fast, and this is often very desirable in OLAP databases.

Nonclustered Indexes

Your table can only be stored in one physical order. But certain times you need to look up data in other ways. For these scenarios, you use a nonclustered index. This is implemented as a B-Tree as well, but it doesn't have any bearing on the order of your table's data, like a clustered index does. That means that if you want data from your table which is not included in your non-clustered index, SQL Server will have to physically look up the data in your table in order to get what you want. This is another operation, and for many queries can be costly, and is a key design consideration when you optimize your tables.

A word

You could write a book on this stuff. Many have. If I haven't bored you to death already, check out Wikipedia's B-Tree page. Start there. If you're still (really) interested, I suggest actually programming a simple B-Tree so you can see what's involved. And, if you want to know even deeper details on how exactly SQL Server stores all of this, check out Kalen Delaney's Inside SQL Server: The Storage Engine. Is all of this learning overkill? That's for you to decide. But the more you study this, the more comfortable you will be with DB development, and the faster your systems will become. I promise.



回答5:

it means that the table is ordered as specified for the clustered index. Non clustered index are physically stored separately.



回答6:

Primary indexes are not technically "clustered" indexes, though both cause a physical sort ordering to the data. The difference is apparent in their very names. A primary index deals with primary keys. Meaning, each primary key must be unique (otherwise it would not be a primary key). A clustering index deals with anything that is not a primary key and, by definition, may be allowed to be non-unique. This is where the word "cluster" comes from. If you sort data that is not primary, it means it can repeat. When repeated data appears together, it is considered a "cluster."