I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.
There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.
My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.
Warning: messy code ahead
declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)
insert into @duplicates (lngInternetPK, lngContactFK, lngValue)
(
select tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue from tblminternet inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)
update @duplicates set lngValue = t.val
from
(select (sum(dupes.lngValue) & 65535) as val from
(select here.lngInternetPK, here.lngContactFK, here.lngValue from tblminternet here inner join
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On here.strAddress = secondemail.strAddress and
here.lngcontactfk = secondemail.lngcontactfk
where count > 1 and here.strAddress is not null and here.lngValue & 256 <> 256) dupes, tblminternet this
where this.lngContactFK = dupes.lngContactFK
) t
where lngInternetPK in (select lngInternetPK from @duplicates)
Edit:
As requested here is some sample data:
Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress
Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"
Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"
If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"
Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).
I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.
SQL Server
lacks native bitwise aggregates, that's why we need to emulate them.The main idea here is to generate a set of bits from
0
to15
, for each bit apply the bitmask to the value and selectMAX
(which will give us anOR
for a given bit), then select theSUM
(which will merge the bit masks).The we just update the first
lngInternetPK
for any given(lngContactFK, strValue)
with the new value oflngValue
, and delete all duplicates.See this article in my blog for more detailed explanations:
You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.
See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx
First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx
Now you can write your bitwise code in VB:
Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx
Register the DLL:
CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE
Create the aggregate in SQL:
CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];
If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:
For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90
For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100
Test your code:
SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar
I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions