I have multiple processes which are continually refreshing data in Redshift. They start a transaction, create a new table, COPY
all the data from S3 into the new table, then drop the old table and rename the new table to the old table.
pseudocode:
start transaction;
create table foo_temp;
copy into foo_temp from S3;
drop table foo;
rename table foo_temp to foo;
commit;
I have several dozen tables that I update in this way. This works well but I would like to have multiple processes performing these table updates for redundancy purposes and to ensure that data is fairly fresh (different processes can update the data for different tables concurrently).
It works fine unless one process attempts to refresh a table that another process is working on. In that case the second process gets blocked by the first until it commits, and when it commits the second process gets the error:
ERROR: table 12345 dropped by concurrent transaction
Is there a simple way for me to guarantee that only one of my processes is refreshing a table so that the second process doesn't get into this situation?
I considered creating a special lock table for each of my real tables. The process would LOCK
the special lock table before working on the companion real table. I think that will work but I would like to avoid creating a special lock table for each of my tables.
you need to protect readers from seeing the drop, do this by: