Fastest options for merging two tables in SQL Serv

2019-08-27 21:09发布

问题:

Consider two very large tables, Table A with 20 million rows in, and Table B which has a large overlap with TableA with 10 million rows. Both have an identifier column and a bunch of other data. I need to move all items from Table B into Table A updating where they already exist.

Both table structures
 - Identifier int
 - Date DateTime,
 - Identifier A
 - Identifier B
 - General decimal data.. (maybe 10 columns)

I can get the items in Table B that are new, and get the items in Table B that need to be updated in Table A very quickly, but I can't get an update or a delete insert to work quickly. What options are available to merge the contents of TableB into TableA (i.e. updating existing records instead of inserting) in the shortest time?

I've tried pulling out existing records in TableB and running a large update on table A to update just those rows (i.e. an update statement per row), and performance is pretty bad, even with a good index on it.

I've also tried doing a one shot delete of the different values out of TableA that exist in TableB and performance of the delete is also poor, even with the indexes dropped.

I appreciate that this may be difficult to perform quickly, but I'm looking for other options that are available to achieve this.

回答1:

Since you deal with two large tables, in-place updates/inserts/merge can be time consuming operations. I would recommend to have some bulk logging technique just to load a desired content to a new table and the perform a table swap:

Example using SELECT INTO:

SELECT * 
INTO NewTableA
FROM (
SELECT * FROM dbo.TableB b WHERE NOT EXISTS (SELECT * FROM dbo.TableA a WHERE a.id = b.id)
UNION ALL
SELECT * FROM dbo.TableA a
) d

exec sp_rename 'TableA', 'BackupTableA'
exec sp_rename 'NewTableA', 'TableA'

Simple or at least Bulk-Logged recovery is highly recommended for such approach. Also, I assume that it has to be done out of business time since plenty of missing objects to be recreated on a new tables: indexes, default constraints, primary key etc.



回答2:

A Merge is probably your best bet, if you want to both inserts and updates.

MERGE #TableB AS Tgt
    USING (SELECT * FROM #TableA) Src
    ON (Tgt.Identifier = SRc.Identifier)  
    WHEN MATCHED THEN   
        UPDATE SET Date = Src.Date, ...
    WHEN NOT MATCHED THEN  
        INSERT (Identifier, Date, ...)  
        VALUES (Src.Identifier, Src.Date, ...);

Note that the merge statement must be terminated with a ;