What do Clustered and Non clustered index actually-第2页回答

I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered and Non clustered indexes. I googled and what I found was :

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A nonclustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a nonclustered index does not consist of the data pages. Instead, the leaf nodes contain index rows.

What I found in SO was What are the differences between a clustered and a non-clustered index?.

Can someone explain this in plain English?

标签： sql-server performance indexing clustered-index non-clustered-index

10条回答

查无此人

2楼-- · 2018-12-31 10:59

Clustered Index

Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.

Nonclustered

Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value. The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.

You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits, and execute fully covered, indexed, queries. For more information, see Create Indexes with Included Columns. For details about index key limits see Maximum Capacity Specifications for SQL Server.

Reference: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described

0人赞添加讨论(0) 举报

墨雨无痕

3楼-- · 2018-12-31 11:02

A clustered index means you are telling the database to store close values actually close to one another on the disk. This has the benefit of rapid scan / retrieval of records falling into some range of clustered index values.

For example, you have two tables, Customer and Order:

Customer
----------
ID
Name
Address

Order
----------
ID
CustomerID
Price

If you wish to quickly retrieve all orders of one particular customer, you may wish to create a clustered index on the "CustomerID" column of the Order table. This way the records with the same CustomerID will be physically stored close to each other on disk (clustered) which speeds up their retrieval.

P.S. The index on CustomerID will obviously be not unique, so you either need to add a second field to "uniquify" the index or let the database handle that for you but that's another story.

Regarding multiple indexes. You can have only one clustered index per table because this defines how the data is physically arranged. If you wish an analogy, imagine a big room with many tables in it. You can either put these tables to form several rows or pull them all together to form a big conference table, but not both ways at the same time. A table can have other indexes, they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data.

0人赞添加讨论(0) 举报

若你有天会懂

4楼-- · 2018-12-31 11:03

A very simple, non-technical rule-of-thumb would be that clustered indexes are usually used for your primary key (or, at least, a unique column) and that non-clustered are used for other situations (maybe a foreign key). Indeed, SQL Server will by default create a clustered index on your primary key column(s). As you will have learnt, the clustered index relates to the way data is physically sorted on disk, which means it's a good all-round choice for most situations.

0人赞添加讨论(0) 举报

浮光初槿花落

5楼-- · 2018-12-31 11:05

Let me offer a textbook definition on "clustering index", which is taken from 15.6.1 from Database Systems: The Complete Book:

We may also speak of clustering indexes, which are indexes on an attribute or attributes such that all of tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them.

To understand the definition, let's take a look at Example 15.10 provided by the textbook:

A relation R(a,b) that is sorted on attribute a and stored in that order, packed into blocks, is surely clusterd. An index on a is a clustering index, since for a given a-value a1, all the tuples with that value for a are consecutive. They thus appear packed into blocks, execept possibly for the first and last blocks that contain a-value a1, as suggested in Fig.15.14. However, an index on b is unlikely to be clustering, since the tuples with a fixed b-value will be spread all over the file unless the values of a and b are very closely correlated.

Note that the definition does not enforce the data blocks have to be contiguous on the disk; it only says tuples with the search key are packed into as few data blocks as possible.

A related concept is clustered relation. A relation is "clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. In other words, from a disk block perspective, if it contains tuples from different relations, then those relations cannot be clustered (i.e., there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn't belong to the relation in the current disk block). Clearly, R(a,b) in example above is clustered.

To connect two concepts together, a clustered relation can have a clustering index and nonclustering index. However, for non-clustered relation, clustering index is not possible unless the index is built on top of the primary key of the relation.

"Cluster" as a word is spammed across all abstraction levels of database storage side (three levels of abstraction: tuples, blocks, file). A concept called "clustered file", which describes whether a file (an abstraction for a group of blocks (one or more disk blocks)) contains tuples from one relation or different relations. It doesn't relate to the clustering index concept as it is on file level.

However, some teaching material likes to define clustering index based on the clustered file definition. Those two types of definitions are the same on clustered relation level, no matter whether they define clustered relation in terms of data disk block or file. From the link in this paragraph,

An index on attribute(s) A on a file is a clustering index when: All tuples with attribute value A = a are stored sequentially (= consecutively) in the data file

Storing tuples consecutively is the same as saying "tuples are packed into roughly as few blocks as can possibly hold those tuples" (with minor difference on one talking about file, the other talking about disk). It's because storing tuple consecutively is the way to achieve "packed into roughly as few blocks as can possibly hold those tuples".

0人赞添加讨论(0) 举报

上一页 1 2

What do Clustered and Non clustered index actually

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间