There is a table , which has 200 rows . But number of live tuples showing there is more than that (around 60K) .
select count(*) from subscriber_offset_manager;
count
-------
200
(1 row)
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 61453 | 5
(1 row)
But as seen from pg_stat_activity and pg_locks , we are not able to track any open connection .
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity
USING (pid)
WHERE relation::regclass = 'subscriber_offset_manager'::regclass
;
query | state | locktype | mode
-------+-------+----------+------
(0 rows)
I also tried full vacuum on this table , Below are results :
- All the times no rows are removed
- some times all the live tuples become dead tuples .
Here is output .
vacuum FULL VERBOSE ANALYZE subscriber_offset_manager;
INFO: vacuuming "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": found 0 removable, 67920 nonremovable row versions in 714 pages
DETAIL: 67720 dead row versions cannot be removed yet.
CPU 0.01s/0.06u sec elapsed 0.13 sec.
INFO: analyzing "public.subscriber_offset_manager"
INFO: "subscriber_offset_manager": scanned 710 of 710 pages, containing 200 live rows and 67720 dead rows; 200 rows in sample, 200 estimated total rows
VACUUM
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 200 | 67749
and after 10 sec
SELECT schemaname,relname,n_live_tup,n_dead_tup FROM pg_stat_user_tables where relname='subscriber_offset_manager' ORDER BY n_dead_tup
;
schemaname | relname | n_live_tup | n_dead_tup
------------+---------------------------+------------+------------
public | subscriber_offset_manager | 68325 | 132
How Our App query to this table .
Our application generally select some rows and based on some business calculation, update the row .
select query -- select based on some id
select * from subscriber_offset_manager where shard_id=1 ;
update query -- update some other column for this selected shard id
around 20 threads do this in parallel and One thread works on only one row .
- app is writen in java and we are using hibernate to do db operations .
- Postgresql version is 9.3.24
One more interesting observation : - when i stop my java app and then do full vacuum , it works fine (number of rows and live tuples become equal). So there is something wrong if we select and update continuously from java app . –
Problem/Issue
These live tuples some times go to dead tuples and after some times again comes to live .
Due to above behaviour select from the table taking time and increasing load on server as lots of live/deadtuples are there ..
I got the issue ☺ .
For Understanding the issue consider the following flow :
Thread 1 -
Many Threads of Type Thread-1 running in parallel .
Thread 2 -
Temporary Solution - If i close all those connection made by Thread-2 by using pg_cancel_backend then vacuuming starts working .
Also we have recreated the issue many times and tried this solution and it worked .
Now, there are following doubts which are still not answered .
Some more mind blowing observation :
@Erwin Brandstetter and @Laurenz Albe , if you know there is bug related to postgres/jdbc .
There might be locks after all, your query might be misleading:
pg_locks.pid
can be NULL, then the join would eliminate rows. The manual for Postgres 9.3:Bold emphasis mine. (Still the same in pg 10.)
Do you get anything for the simple query?
This could explain why
VACUUM
complains:This, in turn, would point to problems in your application logic / queries, locking more rows than necessary.
My first idea would be long running transactions, where even a simple
SELECT
(acquiring a lowlyACCESS SHARE
lock) can blockVACUUM
from doing its job. 20 threads in parallel might chain up and lock outVACUUM
indefinitely. Keep your transactions (and their locks) as brief as possible. And make sure your queries are optimized and don't lock more rows than necessary.One more thing to note: transaction isolation levels
SERIALIZABLE
orREPEATABLE READ
make it much harder forVACUUM
to clean up. DefaultREAD COMMITTED
mode is less restrictive, butVACUUM
can still be blocked as discussed.Related:
I know three things that keep
VACUUM
from doing its job:Long running transactions.
Prepared transactions that did not get committed.
Stale replication slots.
See my blog post for details.