Guidelines for using Merge task in SSIS

2019-07-07 02:23发布

问题:

I have a table with three fields, one an identity field, and I need to add some new records from a source that has the other two fields. I'm using SSIS, and I think I should use the merge tool, because one of the sources is not in the local database. But, I'm confused by the merge tool and the proper process.

I have my one source (an Oracle table), and I get two fields, well_id and well_name, with a sort after, sorting by well_id. I have the destination table (sql server), and I'm also using that as a source. It has three fields: well_key (identity field), well_id, and well_name, and I then have a sort task, sorting on well_id. Both of those are input to my merge task. I was going to output to a temporary table, and then somehow get the new records back into the sql server table.

Oracle Well                     SQL Well
     |                           |
     V                           V   
Sort Source                     Sort Well
     |                           |
     ------->  Merge* <-----------
                |
                V
              Temp well table

I suspect this isn't the best way to use this tool, however. What are the proper steps for a merge like this?

One of my reasons for questioning this method is that my merge has an error, telling me that the "Merge Input 2" must be sorted, but its source is a sort task, so it IS sorted.

Example data

SQL Well (before merge)
well_key   well_id   well_name
1          123       well k
2          292       well c
3          344       well t
5          439       well d

Oracle Well
well_id     well_name
123         well k
292         well c
311         well y
344         well t
439         well d
532         well j

SQL Well (after merge)
well_key   well_id   well_name
1          123       well k
2          292       well c
3          344       well t
5          439       well d
6          311       well y
7          532       well j

Would it be better to load my Oracle Well to a temporary local file, and then just use a sql insert statment on it?

回答1:

First of all, I would highly recommend getting rid of the sort transformation before merging. The sort is very expensive blocking asynchronous transformation and can be easily avoided by doing an order by and changing your ole db src advance properties of your ouput columns to issorted = true and changing the sortkeyposition property of that column to 1. Dont forget to add your Order by in your tsql since the ole db source does not automatically do this for you:

Synchronous vs Asynchronous

Oledb Src IsSorted Property

Then, I would test the performance of the pkg using the Merge. Looks as though the fields you are pulling are not that big in size; however, if there is 10's of millions of records, then I would also recommend creating a pkg that lands the data into two staging tables, joining them, and then inserting the data as you have suggested trying above to see if you get better performance using that approach.

Hope this helps.