How should I deal with my UNIQUE constraints durin

2019-06-07 00:45发布

问题:

when I execute the following sql (which is contained by a sql file generated by pg_dump of Postgres9.4) in greenplum:

CREATE TABLE "public"."trm_concept" (
"pid" int8 NOT NULL,
"code" varchar(100)  NOT NULL,
"codesystem_pid" int8,
"display" varchar(400) ,
"index_status" int8,
CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"),
CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code")
);

I got this error:

ERROR:  Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints

why greenplum doesn't allow this? I really need this unique constraint to guarantee some rule, how can I fix it in greenplum?

回答1:

  • a UNIQUE constraint is done with a btree index
  • Primary Keys imply UNIQUE and NOT NULL
  • GreenPlum distributes to child/shards or whatever on whatever you claim as UNIQUE.

For GreenTree to implement a UNIQUE constraint -- as you want -- that index would have to be

  • copied to every child
  • updated in an ACID compliant manner

Doing that would totally remove the benefits of running GreenPlum. You may as well move back to PostgreSQL.

From the docs about CREATE TABLE

When creating a table, there is an additional clause to declare the Greenplum Database distribution policy. If a DISTRIBUTED BY or DISTRIBUTED RANDOMLY clause is not supplied, then Greenplum assigns a hash distribution policy to the table using either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have a column of an eligible data type, the rows are distributed based on a round-robin or random distribution. To ensure an even distribution of data in your Greenplum Database system, you want to choose a distribution key that is unique for each record, or if that is not possible, then choose DISTRIBUTED RANDOMLY.

Same doc says this about PRIMARY KEY,

For a table to have a primary key, it must be hash distributed (not randomly distributed), and the primary key The column(s) that are unique must contain all the columns of the Greenplum distribution key.

Here is what the docs on CREATE INDEX

In Greenplum Database, unique indexes are allowed only if the columns of the index key are the same as (or a superset of) the Greenplum distribution key. On partitioned tables, a unique index is only supported within an individual partition - not across all partitions.



回答2:

Many (perhaps most) analytic databases offer no support for constraints like these. Greenplum is somewhat unique in supporting enforceable PRIMARY KEY.

FWIW, in Redshift I run extra logic after any ETL step that changes data to ensure that my constraints are still valid.

You could try the same approach here but I would strongly suggest that you partition the table on the columns which will be checked.