I would like to understand better a mechanism of locking in postgres.
Let's say that tree can have apples (via foreign key on apple table). It seems that when selecting a tree for update lock is obtained on an apple. However, the operation is not blocked even if someone else already holds a lock on this apple.
Why is it so?
p.s. Please don't suggest to remove "select for update".
Scenario
Transaction 1 Transaction 2
BEGIN .
update apple; .
. BEGIN
. select tree for update;
. update apple;
. --halts because of the other transaction locking an apple
update apple; .
-- deadlock .
COMMIT
--transaction succeeds
Code
If you want to try it in your postgres - here is a code you can copy/paste.
I have a following db schema
CREATE TABLE trees (
id integer primary key
);
create table apples (
id integer primary key,
tree_id integer references trees(id)
);
and very simple data
insert into trees values(1);
insert into apples values(1,1);
There are two simple transactions. One is updating apples, the second is locking a tree and updating an apple.
BEGIN;
UPDATE apples SET id = id WHERE id = 1;
-- run second transaction in paralell
UPDATE apples SET id = id WHERE id = 1;
COMMIT;
BEGIN;
SELECT id FROM trees WHERE id = 1 FOR UPDATE;
UPDATE apples SET id = id WHERE id = 1;
COMMIT;
When I run them - deadlock occurs on the second update of first transaction.
ERROR: deadlock detected
DETAIL: Process 81122 waits for ShareLock on transaction 227154; blocked by process 81100.
Process 81100 waits for ShareLock on transaction 227153; blocked by process 81122.
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."trees" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x"
Just a wild guess: you're running into an issue related to an implementation detail...
Specifically, your select tree for update
statement acquires an exclusive lock on the trees. And the update apples
statements obtain an exclusive lock on the relevant apples.
When you run the update on apples, Postgres' foreign-key related per row triggers fire, to ensure that the tree_id
exists. I don't recall their precise names off the top of my head, but they're in the catalog and there are bits and pieces in the documentation that reference them explicitly or implicitly, e.g.:
create constraint trigger ... on ... from ...
http://www.postgresql.org/docs/current/static/sql-createtrigger.html
At any rate, these triggers will run something that amounts to the following:
select exists (select 1 from trees where id = 1);
And therein lies your problem: the exclusive access due to the select for update
makes it wait for transaction 2 to release the lock on trees in order to finalize its update statement on apples, but transaction 2 is waiting for transaction 1 to complete in order to obtain a lock on apples so as to begin its update statement on apples.
As a result, Postgres bails with a deadlock.
It looks like index locks are not held for the whole duration of the transaction. I think the main issue is that transaction 1 is doing the same UPDATE
twice, but it needs to acquire more locks to do the second UPDATE
.
According to the docs, index locks are only held for a short time. Unlike data locks, they are not held until the transaction completes. Let's look at the timeline in more detail.
Transaction 1 does the first UPDATE
. This acquires a row level lock on the row in apples
. During the operation, it also acquires a lock on the index in trees
. The transaction has not yet committed, so the row level data lock is still held by transation 1. However, the index lock on trees
is immediately released. Not sure why Postgres does this for all index types.
Transaction 2 comes along and locks trees
for update. This locks both the data and the index. This does not block since Transaction 1 has already released the index lock. This time, both locks are held until the end of the transaction. Not sure why this index lock is held while the other one is released.
Transaction 1 comes back and tries to UPDATE
again. The lock on apples
is fine, since it already has it. The lock on trees
, however, blocks since Transaction 2 has it already.
Adding the UPDATE
in Transaction 2 makes it wait on Transaction 1, causing deadlock.
EDIT:
I came back to investigate this some more now that I have Postgres installed. It is actually really weird. I looked at pg_locks
after committing transaction 2.
Transaction 1 has the following locks:
- RowExclusive on apples_pkey and apples
- Exclusive on its transactionid and virtualxid
Transaction 2 has the following locks (and a lot of other irrelevant ones):
- AccessShare on trees_pkey
- RowShare on trees
- Exclusive on its transactionid and virtualxid
- RowExclusive on apples_pkey and apples
- Exclusive on a tuple in apples
Transaction 2 is also waiting to acquire a Share lock on Transaction 1.
Interestingly, two transactions can hold a RowExclusive lock on the same table. However, an Exclusive locks conflicts with a Share, so Transaction 2 is waiting on Transaction 1's transaction id. The docs mention transaction locks as a way to wait for the other transaction. Hence, it looks like Transaction 2, although committed, is still waiting for Transaction 1.
When Transaction 1 continues, it wants to acquire a Share lock on Transaction 2, and this creates the deadlock. Why does it want to acquire a share lock on Transaction 2? Not too sure about that. The docs hint that this information is not available in pg_locks
. I'm going to guess that this is related to MVCC, but it's still a mystery to me.