PostgreSQL, triggers, and concurrency to enforce a

2019-08-11 01:18发布

问题:

I want to define a trigger in PostgreSQL to check that the inserted row, on a generic table, has the the property: "no other row exists with the same key in the same valid time" (the keys are sequenced keys). In fact, I has already implemented it. But since the trigger has to scan the entire table, now i'm wondering: is there a need for a table-level lock? Or this is managed someway by the PostgreSQL itself?

Here is an example. In the upcoming PostgreSQL 9.0 I would have defined the table in this way:

CREATE TABLE medicinal_products
(
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
market_time PERIOD,
    EXCLUDE USING gist
    (aic_code CHECK WITH =,
    market_time CHECK WITH &&)
);

but in fact I have been defined it like this:

CREATE TABLE medicinal_products
(
PRIMARY KEY (aic_code, vs),
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
vs DATE NOT NULL,
ve DATE,
CONSTRAINT valid_time_range
        CHECK (ve > vs OR ve IS NULL)
);

Then, I have written a trigger that check the costraint: "two distinct medicinal products can have the same code in two different periods, but not in same time".

So the code:

INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');

return an error.

回答1:

One solution is to have a second table to use for detecting clashes, and populate that with a trigger. Using the schema you added into the question:

CREATE TABLE medicinal_product_date_map(
   aic_code char(9) NOT NULL,
   applicable_date date NOT NULL,
   UNIQUE(aic_code, applicable_date));

(note: this is the second attempt due to misreading your requirement the first time round. hope it's right this time).

Some functions to maintain this table:

CREATE FUNCTION add_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
  INSERT INTO medicinal_product_date_map
  SELECT $1, $2 + offset
  FROM generate_series(0, $3 - $2)
$$;
CREATE FUNCTION clr_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
  DELETE FROM medicinal_product_date_map
  WHERE aic_code = $1 AND applicable_date BETWEEN $2 AND $3
$$;

And populate the table first time with:

SELECT count(add_medicinal_product_date_range(aic_code, vs, ve))
FROM medicinal_products;

Now create triggers to populate the date map after changes to medicinal_products: after insert calls add_, after update calls clr_ (old values) and add_ (new values), after delete calls clr_.

CREATE FUNCTION sync_medicinal_product_date_map()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
  IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
    PERFORM clr_medicinal_product_date_range(OLD.aic_code, OLD.vs, OLD.ve);
  END IF;
  IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
    PERFORM add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve);
  END IF;
  RETURN NULL;
END;
$$;
CREATE TRIGGER sync_date_map
  AFTER INSERT OR UPDATE OR DELETE ON medicinal_products
  FOR EACH ROW EXECUTE PROCEDURE sync_medicinal_product_date_map();

The uniqueness constraint on medicinal_product_date_map will trap any products being added with the same code on the same day:

steve@steve@[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT 0 1
steve@steve@[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
ERROR:  duplicate key value violates unique constraint "medicinal_product_date_map_aic_code_applicable_date_key"
DETAIL:  Key (aic_code, applicable_date)=(1        , 2010-03-01) already exists.
CONTEXT:  SQL function "add_medicinal_product_date_range" statement 1
SQL statement "SELECT add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve)"
PL/pgSQL function "sync_medicinal_product_date_map" line 6 at PERFORM

This depends on the values being checked for having a discrete space- which is why I asked about dates vs timestamps. Although timestamps are, technically, discrete since Postgresql only stores microsecond-resolution, adding an entry to the map table for every microsecond the product is applicable for is not practical.

Having said that, you could probably also get away with something better than a full-table scan to check for overlapping timestamp intervals, with some trickery on looking for only the first interval not after or not before... however, for easy discrete spaces I prefer this approach which IME can also be handy for other things too (e.g. reports that need to quickly find which products are applicable on a certain day).

I also like this approach because it feels right to leverage the database's uniqueness-constraint mechanism this way. Also, I feel it will be more reliable in the context of concurrent updates to the master table: without locking the table against concurrent updates, it would be possible for a validation trigger to see no conflict and allow inserts in two concurrent sessions, that are then seen to conflict when both transaction's effects are visible.



回答2:

Just a thought, in case the valid time blocks could be coded with a number or something, creating a UNIQUE index on Id+TimeBlock would be blazingly fast and resolve all table lock problems.

It is managed by PostgreSQL itself. On a select it acquires an ACCESS_SHARE lock which means that you can query the table but do not perform updates.

A radical solution which might help you is to use a cache like ehcache or memcached to store the id/timeblock info and not use the postgresql at all. Many can be persisted so they would survive a server restart and they do not exhibit this locking behavior.



回答3:

Why can't you use a UNIQUE constraint? Will be much faster (it's an index) and easier.