I have an SSIS routine that reads from a very dynamic table and inserts whichever rows it finds into a table in a different database, before truncating the original source table.
Due to the dynamic nature of the source table this truncation not surprisingly leads to rows not making it to the second database.
What is the best way of deleting only those rows that have been migrated?
There is an identity column on the source table but it is not migrated across.
I can't change either table schema.
A option, that might sound stupid but it works, is to delete first and use the OUTPUT clause.
I created a simple control flow that populates a table for me.
In my OLE DB Source, instead of using a SELECT, DELETE the data you want to go down the pipeline and OUTPUT the DELETED virtual table. Somethinng like
It works, it works!
One option would be to create a table to log the identity of your processed records into, and then a separate package (or dataflow) to delete those records. If you're already logging processed records somewhere then you could just add the identity there - otherwise, create a new table to store the data.
A second option: If you're trying to avoid creating additional tables, then separate the record selection and record processing into two stages. Broadly, you'd select all your records in the control flow, then process them on-by-one in the dataflow. Specifically:
There's quite a few examples of this online; e.g. this one from the venerable Jamie Thomson, or this one which includes a bit more detail.
Note that you didn't talk about the scale of the data; if you have very large numbers of records the first suggestion is likely a better choice. Note that in both cases you lose the advantage of the table truncation (because you're using a standard
delete
call).