I'm building a queuing mechanism of sorts. There are rows of data that need processing, and a status flag. I'm using an update .. returning
clause to manage it:
UPDATE stuff
SET computed = 'working'
WHERE id = (SELECT id from STUFF WHERE computed IS NULL LIMIT 1)
RETURNING *
Is the nested select part the same lock as the update, or do I have a race condition here? If so, does the inner select need to be a select for update
?
While Erwin's suggestion is possibly the simplest way to get correct behavior (so long as you retry your transaction if you get an exception with
SQLSTATE
of 40001), queuing applications by their nature tend to work better with requests blocking for a chance to take their turn at the queue than with the PostgreSQL implementation ofSERIALIZABLE
transactions, which allows higher concurrency and is somewhat more "optimistic" about the chances of collision.The example query in the question, as it stands, in the default
READ COMMITTED
transaction isolation level would allow two (or more) concurrent connections to both "claim" the same row from the queue. What will happen is this:UPDATE
phase.COMMIT
orROLLBACK
of T1.id
matches), and also "claims" the row.It can be modified to work correctly (if you are using a version of PostgreSQL which allows the
FOR UPDATE
clause in a subquery). Just addFOR UPDATE
to the end of the subquery which selects the id, and this will happen:COMMIT
orROLLBACK
of T1.At the
REPEATABLE READ
orSERIALIZABLE
transaction isolation level, the write conflict would throw an error, which you could catch and determine was a serialization failure based on the SQLSTATE, and retry.If you generally want SERIALIZABLE transactions but you want to avoid retries in the queuing area, you might be able to accomplish that by using an advisory lock.
If you are the only user, the query should be fine. In particular, there is no race condition or deadlock within the query itself (between the outer query and the subquery). I quote the manual here:
For concurrent use, the matter may be more complicated. You would be on the safe side with
SERIALIZABLE
transaction mode:You need to prepare for serialization failures and retry your query in such a case.
But I am not entirely sure if this isn't overkill. I'll ask @kgrittn to stop by .. he is the expert with concurrency and serializable transactions ..
And he did. :)
Best of both worlds
Run the query in default transaction mode
READ COMMITTED
.For Postgres 9.5 or later use
FOR UPDATE SKIP LOCKED
. See:For older versions recheck the condition
computed IS NULL
explicitly in the outerUPDATE
:As @kgrittn's advised in the comment to his answer, this query could come up empty, without having done anything, in the (unlikely) case it got intertwined with a concurrent transaction.
Therefore, it would work much like the first variant in transaction mode
SERIALIZABLE
, you would have to retry - just without the performance penalty.The only problem: While the conflict is very unlikely because the window of opportunity is just so tiny, it can happen under heavy load. You could not tell for sure whether there are finally no more rows left.
If that does not matter (like in your case), you are done here.
If it does, to be absolutely sure, start one more query with explicit locking after you get an empty result. If this comes up empty, you are done. If not, continue.
In plpgsql it could look like this:
That should give you the best of both worlds: performance and reliability.