I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:
CREATE TABLE product_pricebands (
template_sku varchar(20) NOT NULL,
colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,
currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,
master_price numeric(10,2),
my_custom_field boolean,
UNIQUE (template_sku, siteid, currencyid, colourid)
);
On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).
My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.
From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:
1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.
2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?
You can easily add a serial column later if you need one:
The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):
If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.
Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.
As
you might want to create two unique indexes. The combination
(template_sku, siteid, currencyid, colourid)
cannot be aPRIMARY KEY
, because of the nullablecolourid
, but you can create aUNIQUE
constraint like you already have (implementing an index automatically):This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with
(colourid IS NULL)
:To cover all bases. I wrote more about that technique in a related answer on dba.SE.
The simple alternative to the above is to make
colourid
NOT NULL and create a primary key instead of the aboveproduct_pricebands_uni_idx
.Also, as you
for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.
How do you know, which indexes are used (needed)?
EXPLAIN ANALYZE
.It may also be faster to select the few rows with
my_custom_field = TRUE
into a temporary table,TRUNCATE
the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:This avoids a lot of vacuuming.