Remove duplicates in SSIS Data Flow

2019-02-17 08:00发布

问题:

I am working on an SSIS data flow task.

The source table is from old database which is denormalized.

The destination table is normalized.

SSIS fails because the data transfer is not possible because of duplicates (duplicates in primary key column).

It would be good if the SSIS can checks the destination for availability of current record (by checking the key) and if it exists , it can ignore pushing it. Then it can continue with the next record.

Is there a way to handle this scenario?

回答1:

Assuming your destination table is a subset of your source table, you should be able to use the Sort Transformation to pull in only the columns you need for your destination table, and then check the "Remove rows with duplicate sort values" to basically give you a distinct list of records based on the columns you selected.

Then, simply route the results of the sort to your destination, and you should be good to go.



标签: ssis