SSIS - Delete rows

2019-03-13 01:16发布

问题:

I'm new to SSIS and need help on this one. I found an article which describes how to detect rows which exist and which have changed. The part that I'm missing is how to update rows that changed. I found some articles which say that it's also good solution to delete records which have changed and insert new recordset. The thing is I don't know how to do that step of deleting (red box).

Any suggestions?

回答1:

If you have to delete the rows within Data Flow Task, then you need to use the OLE DB Command transformation and write a DELETE statement like DELETE FROM dbo.Table WHERE ColumnName = ?. Then in the column mappings of the OLE DB Command transformation, you will map the parameter represented by the question mark with the data that comes from the previous transformation. In your case, the data that comes from Union All 2.

However, I wouldn't recommend that option because OLE DB Command executes for every row and it might slow down your package if there are too many rows.

I would recommend something like this:

  1. Redirect the output from the Union All 2 to a temporary staging table (say dbo.Staging) using OLE DB Destination.

  2. Let's us assume that your final destination table is dbo.Destination. Now, your Staging table has all the records that should be deleted from the table Destination.

  3. On the Control Flow tab, place an Execute SQL Task after the Data Flow Task. In the Execute SQL Task, write an SQL statement or use a stored procedure that would call an SQL statement to join the records between Staging and Destination to delete all the matching rows from Destination table.

  4. Also, place another Execute SQL Task before the Data Flow Task. In this Execute SQL Task, delete/truncate rows from the Staging table.

Something like this might work to delete the rows:.

DELETE      D
FROM        dbo.Destination D
INNER JOIN  dbo.Staging     S
ON          D.DestinationId = S.StagingId

Hope that helps.



回答2:

In addition to user756519 answer. If you have millions of records to delete the last step (4) for ExecuteSQL Delete statement can be done in batches with something like this:

WHILE (1=1)
BEGIN
    DELETE D
    from dbo.Destination D
    inner join
    (
        -- select ids that should be removed from table
        SELECT TOP(10000) DestinationId
        FROM
            (
            SELECT
                D1.DestinationId,
                S.StagingId
            from 
                dbo.Destination as D1
            LEFT JOIN
                dbo.Staging as S
            ON
                D1.DestinationId = S.StagingId
            ) AS G
        WHERE
            StagingId IS NULL
    ) as R
    on D.DestinationId = R.DestinationId;

    IF @@ROWCOUNT < 1 BREAK

    -- info message
    DECLARE @timestamp VARCHAR(50)
    SELECT @timestamp = CAST(getdate() AS VARCHAR)
    RAISERROR ('Chunk deleted %s', 10, 1,@timestamp) WITH NOWAIT
END


回答3:

I recomend you to use Spoon (Kettle) it's more flexible, it has a design tool called "Dimension lookup/update" that helps you to maintenance a dimension table, you can specify which action you want depends of the column changes (update the record, insert new record) all these based on a time-version control of the records.