How do I append only distinct records from a master table to another table, when the master may have duplicates. Example - I only want the distinct records in the smaller table but I need to insert/append records to what I already have in the smaller table.
相关问题
- SQL join to get the cartesian product of 2 columns
- sql execution latency when assign to a variable
- Difference between Types.INTEGER and Types.NULL in
- php PDO::FETCH_ASSOC doesnt detect select after ba
- Bulk update SQL Server C#
You could use the
distinct
keyword to filter out duplicates:You don't say the scale of the problem so I'll mention something I recently helped a friend with.
He works for an insurance company that provides supplemental Dental and Vision benefits management for other insurance companies. When they get a new client they also get a new database that can have 10's of millions of records. They wanted to identify all possible dupes with the data they already had in a master database of 100's of millions of records.
The solution we came up with was to identify two distinct combinations of field values (normalized in various ways) that would indicate a high probability of a dupe. We then created a new table containing MD5 hashes of the combos plus the id of the master record they applied to. The MD5 columns were indexed. All new records would have their combo hashes computed and if either of them had a collision with the master the new record would be kicked out to an exceptions file for some human to deal with it.
The speed of this surprised the hell out of us (in a nice way) and it has had a very acceptable false-positive rate.
Based on Microsoft SQL Server and its Transact-SQL. Untested as always and the target_table has the same amount of rows as the source table (otherwise use columnnames between
INSERT INTO
andSELECT
Ignoring any concurency issues:
You can also rephrase it as a join:
If you don't like
NOT EXISTS
andEXCEPT
/MINUS
(cute, Remus!), you have alsoLEFT JOIN
solution:Something like this would work for SQL Server (you don't mention what RDBMS you're using):
Tune where appropriate, depending on exactly what defines "distinctness" for your table.