I'm new to SSIS and need help on this one. I found an article which describes how to detect rows which exist and which have changed. The part that I'm missing is how to update rows that changed. I found some articles which say that it's also good solution to delete records which have changed and insert new recordset. The thing is I don't know how to do that step of deleting (red box).
Any suggestions?
If you have to delete the rows within Data Flow Task, then you need to use the OLE DB Command
transformation and write a DELETE statement like DELETE FROM dbo.Table WHERE ColumnName = ?
. Then in the column mappings of the OLE DB Command transformation, you will map the parameter represented by the question mark with the data that comes from the previous transformation. In your case, the data that comes from Union All 2.
However, I wouldn't recommend that option because OLE DB Command executes for every row and it might slow down your package if there are too many rows.
I would recommend something like this:
Redirect the output from the Union All 2
to a temporary staging table (say dbo.Staging) using OLE DB Destination.
Let's us assume that your final destination table is dbo.Destination. Now, your Staging table has all the records that should be deleted from the table Destination.
On the Control Flow tab, place an Execute SQL Task
after the Data Flow Task
. In the Execute SQL Task, write an SQL statement or use a stored procedure that would call an SQL statement to join the records between Staging and Destination to delete all the matching rows from Destination table.
Also, place another Execute SQL Task before the Data Flow Task. In this Execute SQL Task, delete/truncate rows from the Staging table.
Something like this might work to delete the rows:.
DELETE D
FROM dbo.Destination D
INNER JOIN dbo.Staging S
ON D.DestinationId = S.StagingId
Hope that helps.
In addition to user756519 answer. If you have millions of records to delete the last step (4) for ExecuteSQL Delete statement can be done in batches with something like this:
WHILE (1=1)
BEGIN
DELETE D
from dbo.Destination D
inner join
(
-- select ids that should be removed from table
SELECT TOP(10000) DestinationId
FROM
(
SELECT
D1.DestinationId,
S.StagingId
from
dbo.Destination as D1
LEFT JOIN
dbo.Staging as S
ON
D1.DestinationId = S.StagingId
) AS G
WHERE
StagingId IS NULL
) as R
on D.DestinationId = R.DestinationId;
IF @@ROWCOUNT < 1 BREAK
-- info message
DECLARE @timestamp VARCHAR(50)
SELECT @timestamp = CAST(getdate() AS VARCHAR)
RAISERROR ('Chunk deleted %s', 10, 1,@timestamp) WITH NOWAIT
END
I recomend you to use Spoon (Kettle) it's more flexible, it has a design tool called "Dimension lookup/update" that helps you to maintenance a dimension table, you can specify which action you want depends of the column changes (update the record, insert new record) all these based on a time-version control of the records.