Check if table data has changed?

2019-09-02 12:46发布

I am pulling the data from several tables and then passing the data to a long running process. I would like to be able to record what data was used for the process and then query the database to check if any of the tables have changed since the process was last run.

Is there a method of solving this problem that should work across all sql databases?

One possible solution that I've thought of is having a separate table that is only used for keeping track of whether the data has changed since the process was run. The table contains a "stale" flag. When I start running the process, stale is set to false. If any creation, update, or deletion occurs in any of the tables on which the operation depends, I set stale to true. Is this a valid solution? Are there better solutions?

One concern with my solution is situations like this:

One user starts inserting a new row into one of the tables. Stale gets set to true, but the new row has not actually been added yet. Another user has simultaneously started the long running process, pulling the data from the tables and setting the flag to false. The row is finally added. Now the data used for the process is out of date but the flag indicates it is not stale. Would transactions be able to solve this problem?

EDIT:

This is some SQL for my idea. Not sure if it works, but just to give you a better idea of what I was thinking:

# First transaction reads the data and sets the flag to false
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  UPDATE flag SET stale = false
  SELECT * FROM DATATABLE  
COMMIT TRANSACTION

# Second transaction updates the data and sets the flag to true
BEGIN TRANSACTION
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  UPDATE data SET val = 15 WHERE ID = 10
  UPDATE flag SET stale = true
COMMIT TRANSACTION    

I do not have much experience with transactions or handwriting xml, so there are probably issues with this. From what I understand two serializable transactions can not be interleaved. Please correct me if I'm wrong.

Is there a way to accomplish this with only the first transaction? The process will be run rarely, but the updates to the data table will occur more frequently, so it would be nice to not lock up the data table when performing updates.

Also, is the SET TRANSACTION ISOLATION syntax specific to MS?

标签: sql
1条回答
趁早两清
2楼-- · 2019-09-02 13:29

The stale flag will probably work, but a timestamp would be better since it provides more metadata about the age of the records which could be used to tune your queries, e.g., only pull data that is over 5 minutes old.

To address your concern about inserting a row at the same time a query is run, transactions with an appropriate isolation level will help. For row inserts, updates, and selects, at least use a transaction with an isolation level that prevents dirty reads so that no other connections can see the updated data until the transaction is committed.

If you are strongly concerned about the case where an update happens at the same time as a record pull, you could use the REPEATABLE READ or even SERIALIZABLE isolation levels, but this will slow DB access down.

Your SQLServer sampled should work. For alternate databases, Here's an example that works in PostGres:

Transaction 1

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- run queries that update the tables, then set last_updated column
UPDATE sometable SET last_updatee = now() WHERE id = 1;;
COMMIT;

Transaction 2

BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- select data from tables, then set last_queried column
UPDATE sometable SET last_queried = now() WHERE id = 1;
COMMIT;

If transaction 1 starts, and then transaction 2 starts before transaction 1 has completed, transaction 2 will block during on the update, and then will throw an error when transaction 1 is committed. If transaction 2 starts first, and transaction 1 starts before that has finished, then transaction 1 will error. Your application code or process should be able to handle those errors.

Other databases use similar syntax - MySQL (with InnoDB plugin) requires you to set the isolation level before you start the transaction.

查看更多
登录 后发表回答