INSERT or SELECT strategy to always return a row?

2019-02-27 10:47发布

问题:

Using Postgres 9.6, I have followed the strategy recommended in https://stackoverflow.com/a/40325406/435563 to do an INSERT or SELECT and return the resulting id:

with ins as (
  insert into prop (prop_type, norm, hash, symbols)
  values (
    $1, $2, $3, $4
  ) on conflict (hash) do
    update set prop_type = 'jargon' where false
  returning id)
select id from ins
union all
select id from prop where hash = $3

However, sometimes this returns nothing. I would have expected it to return a row no matter what. How can I fix it to insure it always returns an id?

NB, despite not returning a row, the row does seem to exist on inspection. I believe the problem may be related to trying to add the same record via two sessions simultaneously.

The table in question is defined as:

create table prop (
  id serial primary key,
  prop_type text not null references prop_type(name),
  norm text not null,
  hash text not null unique,
  symbols jsonb
);

Data:

EDT DETAIL:  parameters: $1 = 'jargon', $2 = 'j2', $3 = 'lXWkZSmoSE0mZ+n4xpWB', $4 = '[]'

If I change prop_type = 'jargon' to prop_type = 'foo' it works! It would seem the lock isn't taken if the expression wouldn't change anything even given the where false clause. Does this really need to depend on my guessing a value that wouldn't be in the row, though? Or is there a better way to ensure you get the lock?

--- UPDATE ---

The overall situation is that the application tried to save a directed acyclic graph using a connection pool (...with autocommit), and was using this query to get id while winnowing out duplications. [Turns out that much smarter is to use a transaction and just serialize to one connection. But the behavior when there is contention here is odd.]

The foreign key constraint doesn't seem to affect the insert -- e.g.:

create table foo(i int unique, prop_id int references prop(id));
insert into foo values (1, 208);
insert into foo values (1, 208) 
on conflict (i) do update set prop_id = 208 where false;
--> INSERT 0 0
insert into foo values (1, 208) 
on conflict (i) do update set prop_id = -208 where false;
--> INSERT 0 0

Note one with valid fk 208, the other with invalid -208. If I connect a select onto either of these with the full pattern, then in situations without contention, they both return i = 1 as expected.

回答1:

Your observation seems impossible. The above command should always return an id, either for the newly inserted row or for the pre-existing row. Concurrent writes cannot mess with this since existing conflicting rows are locked. Explanation in this related answer:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?

Unless an exception is raised, of course. You get an error message instead of a result in that case. Did you check that? Do you have error-handling in place? (In case your app somehow discards error messages: 1) Fix that. 2) There is an additional entry in the DB log with default logging settings.)

I do see a FK constraint in your table definition:

prop_type text not null references prop_type(name),

If you try to insert a row that violates the constraint, that's exactly what happens. If there is no row with name = 'jargon' in table prop_type, that's what you get:

ERROR:  insert or update on table "prop" violates foreign key constraint "prop_prop_type_fkey"
DETAIL:  Key (prop_type)=(jargon) is not present in table "prop_type".

Demo:

dbfiddle here

Your observation would fit the crime:

If I change prop_type = 'jargon' to prop_type = 'foo' it works!

But your explanation is based on misconceptions:

It would seem the lock isn't taken if the expression wouldn't change anything even given the where false clause.

That's not how Postgres works. The lock is taken either way (explanation in above linked answer), and the Postgres locking mechanism never even considers how the new row compares to the old.

Does this really need to depend on my guessing a value that wouldn't be in the row, though? Or is there a better way to ensure you get the lock?

No. And no.

If missing FK values are indeed the problem, you might add missing (distinct) values in a single statement with rCTEs. Simple for single-row inserts like you demonstrate, but works for inserting many rows at once, too. Related:

  • How do I insert a row which contains a foreign key?
  • INSERT rows into multiple tables in a single query, selecting from an involved table
  • Can INSERT [...] ON CONFLICT be used for foreign key violations?


回答2:

https://www.postgresql.org/docs/9.5/static/sql-insert.html

ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency.

this is regarding the lock youmention in your updated post. Now regarding the initial question with returning row - I read it incarefully first. Now that I saw the where false - with this clause not always you have a row returned. eg:

t=# create table a(i int, e int);
CREATE TABLE
t=# insert into a select 1,1;
INSERT 0 1
t=# create unique index b on a (i);
CREATE INDEX
---now insert on conflict do nothing:
t=# insert into a select 1,1 on conflict do nothing returning *,xmax,xmin;
 i | e | xmax | xmin
---+---+------+------
(0 rows)

INSERT 0 0
-- where false same effect - no rows
t=# insert into a select 1,1 on conflict(i) do update set e=2 where false returning *,xmax,xmin;
 i | e | xmax | xmin
---+---+------+------
(0 rows)
-- now insert without conflict:
t=# insert into a select 2,2 on conflict(i) do update set e=2 where EXCLUDED.e=1 returning *,xmax;
 i | e | xmax
---+---+------
 2 | 2 |    0
(1 row)
-- now insert with update on conflict:
INSERT 0 1
t=# insert into a select 1,1 on conflict(i) do update set e=2 where EXCLUDED.e=1 returning *,xmax;
 i | e |   xmax
---+---+-----------
 1 | 2 | 126943767
(1 row)