可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Say I have a table with a large number of rows and one of the columns which I want to index can have one of 20 values.
If I were to put an index on the column would it be large?
If so, why? If I were to partition the data into the data into 20 tables, one for each value of the column, the index size would be trivial but the indexing effect would be the same.
回答1:
It's not the indexes that will suck. It's putting indexes on the wrong columns that will suck.
Seriously though, why would you need a table with a single column? What would the meaning of that data be? What purpose would it serve?
And 20 tables? I suggest you read up on database design first, or otherwise explain to us the context of your question.
回答2:
Indexes (or indices) don't suck. A lot of very smart people have spent a truly remarkable amount of time of the last several decades ensuring that this is so.
Your schema, however, lacking the same amount of expertise and effort, may suck very badly indeed.
Partitioning, in the case described is equivalent to applying a clustered index. If the table is sorted otherwise (or is in arbitrary order) then the index necessarily has to occupy much more space. Depending on the platform, a non-clustered index may reduce in size as the sortedness of the rows with respect to the indexed value increases.
YMMV.
回答3:
The short answer:
Do indexes suck: Yes and No
The longer answer:
They don't suck if used properly. Maybe you should start reading about how indexes work, why they can work and why they sometimes don't work.
Good starting points:
http://www.sqlservercentral.com/articles/Indexing/
回答4:
No indexes don't suck, but you have to pay attention to how you use them or they can backfire on the performance of your queries.
First: Schema / design
Why would you create a table with only one column? That's probably taking normalization one step to far. Database design is one of the most important things to consider in optimizing performance
Second: Indexes
In a nutshell the indexes will help the database to perform a binary search of your record. Without an index on a column (or set of columns) the database will often fall back to a table scan. A table scan is very expensive because it involves enumerating each and every record.
It doesn't really matter THAT much for index scans how many records there are in the database table. Because of the (balanced) binary tree search doubling the amount of records will only result in one extra search step.
Determine the primary key of your table, SQL will automatically place a clustered index on that column(s). Clustered indexes perform really well. In addition you can place non-clustered indexes on columns that are used often in SELECT, JOIN, WHERE, GROUP BY and ORDER BY statements. Do remember that indexes have a certain overlap, try to never include your clustered index into a non-clustered index.
Also interesting might be the fill factor on the indexes. Do you want to optimize your table for reads (high fill factor - less storage, less IO) or for writes (low fill factor more storage, less rebuilding your database pages).
Third: Partitioning
One of the reasons to use partitioning is to optimize your data access. Let's say you have 1 million records of which 500,000 records are no longer relevant but stored for archiving purposes. In this case you could decide to partition the table and store the 500,000 old records on slow storage and the other 500,000 records on fast storage.
To measure is to know
The best way to get insight in what happens is to measure what happens to your cpu and io. Microsoft SQL server has some tools like the Profiler and Execution plans in Management Studio that will tell you the duration of your query, number of read/writes and cpu usage. Also the execution plan will tell you which or IF indexes are being used. To your surprise you might see a table scan although you didn't expect it.
回答5:
Say I have a table with a large number of rows and one column which I want to index can have one of 20 values. If I were to put an index on the column would it be large?
The index size will be proportional to the number of your rows and the length of the indexed values.
The index keeps not only the indexed value, but also some kind of a pointer to the row (ROWID
in Oracle
, LCID
in PostgreSQL
, primary key in InnoDB
etc).
If you have 10,000
rows and a 1 distinct value, you will still have 10,000
records in your index.
If so, why? If I were to partition the data into the data into 20 tables, one for each value of the column, the index size would be trivial but the indexing effect would be the same
In this case, you would come with 20 indexes being same in size in total as your original one.
This technique is sometimes used in fact in such called partitioned indexes. It has its advantages and drawbacks.
回答6:
Standard b-tree indexes are best suited to fairly selective indexes, which this example would not be. You don't say what DBMS you are using; Oracle has another type of index called a bitmap index which is more suited to low-selectivity indexes in OLAP environments (since these indexes are expensive to maintain, making them unsuitable for OLTP environments).
The optimiser will decide bases on stats whether it thinks the index will help get the data in the fastest time; if it won't, the optmiser won't use it.
Partitioning is another strategy. In Oracle you can define a table as partitioned on some set of columns, and for the optimiser can automatically perform "partition elimination" like you suggest.
回答7:
Sorry, I'm not quite sure what you mean by "large".
If your index is clustered, all the data for each record will be on the same leaf page, thereby creating the most efficient index available to your table as long as you write your queries against it properly.
If your index is non-clustered, then only the index related data will be on your leaf pages. Then, depending on suchs things as how many other indexes you have, coupled with details like your fill factor, your index may or may not be efficient. In general, if you don't have a ton of indexes on your table, you should be safe.
The efficiency of your index will also be determined by the data type of the 20 values you're speaking of going into the column. If those are pre-defined values, then their details should probably be in a lookup table with a simple primary key datatype (like Int/Number). Then add that column to your table as a foreign key with an index on the column.
Ultimately, you could have a perfect index on a column. But it's best use will be determined for the most part by the queries you write. So if your queries make use of the indexes, you're golden.
回答8:
Indexes are purely for performance. If an index doesn't boost performance for the queries you're interested in, then it sucks.
As for disk usage, you have to weigh your concerns. Different SQL providers build indexes differently, but as a client, you generally trust that they do the best that can be done. In the case you're describing, a clustered index may be optimal for both size and performance.
回答9:
It would be large enough to hold those values for all the rows, in a sorted order.
Say you have 20 different strings of 4 characters, and 1 million rows, it would at least be 4 million bytes (or 8 if 16-bit unicode) to hold those values.