How do I find the last time that a PostgreSQL data

2020-05-18 20:05发布

问题:

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.

I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.

回答1:

You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.

So for example a typical way is the following one:

CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  NEW.last_updated := now();
  RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
  BEFORE INSERT OR UPDATE ON sometable
  FOR EACH ROW EXECUTE PROCEDURE stamp_updated();

Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:

SELECT MAX(max_last_updated) FROM (
  SELECT MAX(last_updated) AS max_last_updated FROM sometable
  UNION ALL
  SELECT MAX(last_updated) FROM someothertable
) updates

For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.

-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1

Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)

An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:

CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
  RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
 AFTER INSERT OR UPDATE ON sometable
 FOR EACH STATEMENT EXECUTE stamp_update_log();

This will give you a table with a row for each table update: you can then just do:

SELECT MAX(updated) FROM update_log

To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).

An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.



回答2:

It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details



回答3:

I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:

select n_tup_upd from pg_stat_user_tables  where relname = 'YOUR_TABLE';

every update will increase the count by 1.

bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.



回答4:

See the following article:

MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html



回答5:

You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.

But this is only vague, since vacuum will change these files and the mtime.