How to create an Index in Amazon Redshift

2020-02-08 11:31发布

问题:

I'm trying to create indexes in Amazon Redshift but I received an error

create index on session_log(UserId);

UserId is an integer field.

回答1:

If you try and create an index (with a name) on a Redshift table:

create index IX1 on "SomeTable"("UserId");

You'll receive the error

An error occurred when executing the SQL command: create index IX1 on "SomeTable"("UserId") ERROR: SQL command "create index IX1 on "SomeTable"("UserId")" not supported on Redshift tables.

This is because, like other data warehouses, Redshift uses columnar storage, and as a result, many of the indexing techniques (like adding non-clustered indexes) used in other RDBMS aren't applicable.

You do however have the option of providing a single sort key per table, and you can also influence performance with a distribution key for sharding your data, and selecting appropriate compression encodings for each column to minimize storage and I/O overheads.

For example, in your case, you may elect to use UserId as a sort key:

create table if not exists "SomeTable"
(
    "UserId" int,
    "Name" text
)
sortkey("UserId");

You might want to read a few primers like these



回答2:

You can Define Constraints but will be informational only, as Amazon says: they are not enforced by Amazon Redshift. Nonetheless, primary keys and foreign keys are used as planning hints and they should be declared if your ETL process or some other process in your application enforces their integrity.

Some services like pipelines with insert mode (REPLACE_EXISTING) will need a primary key defined in your table.

For other performance purposes the Stuart's response is correct.



回答3:

Redshift allow to create primary key

create table user (
id int ,
phone_number int,
primary key(id))

but since Redshift does not enforce this constraints, primary key accepts duplicate values.

attached article on that issue

http://www.sqlhaven.com/amazon-redshift-what-you-need-to-think-before-defining-primary-key/