I am curious as to whether
CREATE INDEX idx ON tbl (columns);
vs.
CREATE UNIQUE INDEX idx ON tbl (columns);
has a significant algorithmic performance benefit in PostgreSQL or MySQL implementations when scanning the indexed column(s), or whether the UNIQUE
keyword simply introduces a unique constraint alongside the index.
I imagine it is probably fair to say that there is a marginal benefit insofar as indexes are likely to be internally implemented as some sort of hash1-like structure, and collision handling by definition result in something other than O(1) performance. Given this premise, it is likely that if a large percentage of values are identical than the structure degenerates into something linear.
So, for purposes of my question, assume that the distribution of values is relatively discrete and uniform.
Thanks in advance!
1 Which is a matter of pure speculation for me, as I am not familiar with RDBM internals.
There is a small penalty during update/insert operations for having the unique constraint. It has to search before the insert/update operation to make sure the uniqueness constraint isn't violated.
If your data are unique, you should create a
UNIQUE
index on them.This implies no additional overhead and affects optimizer's decisions in certain cases so that it can choose a better algorithm.
In
SQL Server
and inPostgreSQL
, for instance, if you sort on aUNIQUE
key, the optimizer ignores theORDER BY
clauses used after that (since they are irrelevant), i. e. this query:will use an index on
col_unique
and won't sort onother_col
because it's useless.This query:
will also be converted into an
INNER JOIN
(as opposed to aSEMI JOIN
) if there is aUNIQUE
index onothertable.othercol
.An index always contains some kind of a pointer to the row (
ctid
inPostgreSQL
, row pointer inMyISAM
, primary key/uniquifier inInnoDB
) and the leaves are ordered on these pointers, so in fact every index leaf is unique is some way (though it may not be obvious).See this article in my blog for performance details:
UNIQUE
Well, usually indexes are B-Trees, not hashes (there are hash based indexes, but the most common index (at least in PostgreSQL) is bases on B Tree).
As for speed - unique should be faster - when index scanning finds row with given value, it doesn't have to search if there are any other rows with this value, and can finish scanning imemdiately.