Update column to be different aggregate values

2020-02-26 12:32发布

I am creating a script that for "merging" and deleting duplicate rows from a table. The table contains address information, and uses an integer field for storing information about the email as bit flags (column name lngValue). For example, lngValue & 1 == 1 means its the primary address.

There are instances of the same email being entered twice, but sometimes with different lngValues. To resolve this, I need to take the lngValue from all duplicates and assign them to one surviving record and delete the rest.

My biggest headache so far as been with the "merging" of the records. What I want to do is bitwise or all lngValues of duplicate records together. Here is what I have so far, which only finds the value of all lngValues bitwise or'ed together.

Warning: messy code ahead

declare @duplicates table
(
lngInternetPK int,
lngContactFK int,
lngValue int
)

insert into @duplicates (lngInternetPK, lngContactFK, lngValue) 
(
select  tblminternet.lngInternetPK, tblminternet.lngContactFK, tblminternet.lngValue   from tblminternet  inner join 
(select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
On tblminternet.strAddress = secondemail.strAddress and
tblminternet.lngcontactfk = secondemail.lngcontactfk 
where count > 1 and tblminternet.strAddress is not null and tblminternet.lngValue & 256 <> 256 --order by lngContactFK, strAddress
)

update @duplicates set lngValue = t.val

from 
                (select (sum(dupes.lngValue) & 65535) as val from 
                    (select  here.lngInternetPK,                     here.lngContactFK, here.lngValue from tblminternet here  inner join 
                    (select strAddress, lngcontactfk, count(*) as count from tblminternet where lngValue & 256 <> 256 group by strAddress, lngcontactfk) secondemail
                    On here.strAddress = secondemail.strAddress     and
                    here.lngcontactfk = secondemail.lngcontactfk 
                    where count > 1 and here.strAddress is not      null and here.lngValue & 256 <> 256) dupes, tblminternet this

                where this.lngContactFK = dupes.lngContactFK
                ) t
where lngInternetPK in (select lngInternetPK from @duplicates)    

Edit:
As requested here is some sample data:

Table Name: tblminternet
Column Names:
lngInternetPK
lngContactFK
lngValue
strAddress

Example row 1:
lngInternetPK: 1
lngContactFK: 1
lngValue: 33
strAddress: "me@myaddress.com"

Example row 2:
lngInternetPK: 2
lngContactFK: 1
lngValue: 40
strAddress: "me@myaddress.com"

If these two were merged here is the desired result:
lngInternetPK: 1
lngContactFK: 1
lngValue: 41
strAddress: "me@myaddress.com"

Other necessary rules:
Each contact can have multiple emails, but each email row must be distinct ( each email can only appear as one row).

3条回答
▲ chillily
2楼-- · 2020-02-26 13:19

I believe the following query gets you what you want. This routine assumes a max of two duplicate addresses per contact. If there's more than one dup per contact, the query will have to be modified. I hope this helps.

Declare @tblminternet 
Table 
( lngInternetPK int,   
  lngContactFK int,  
  lngValue int, 
  strAddress varchar(255)
)

Insert Into @tblminternet 
select 1, 1, 33, 'me@myaddress.com' 
union
select 2, 1, 40, 'me@myaddress.com'
union 
select 3, 2, 33, 'me@myaddress2.com'
union 
select 4, 2, 40, 'me@myaddress2.com'
union 
select 5, 3, 2, 'me@myaddress3.com'

--Select * from @tblminternet

Select  Distinct   
    A.lngContactFK , 
    A.lngValue | B.lngValue as 'Bitwise OR', 
    A.strAddress
From @tblminternet A, @tblminternet B
Where A.lngContactFK = B.lngContactFK
And A.strAddress = B.strAddress
And A.lngInternetPK != B.lngInternetPK
查看更多
Explosion°爆炸
3楼-- · 2020-02-26 13:24

SQL Server lacks native bitwise aggregates, that's why we need to emulate them.

The main idea here is to generate a set of bits from 0 to 15, for each bit apply the bitmask to the value and select MAX (which will give us an OR for a given bit), then select the SUM (which will merge the bit masks).

The we just update the first lngInternetPK for any given (lngContactFK, strValue) with the new value of lngValue, and delete all duplicates.

;WITH   bits AS
        (
        SELECT  0 AS b
        UNION ALL
        SELECT  b + 1
        FROM    bits
        WHERE   b < 15
        ),
        v AS
        (
        SELECT  i.*,
                (
                SELECT  SUM(value)
                FROM    (
                        SELECT  MAX(lngValue & POWER(2, b)) AS value
                        FROM    tblmInternet ii
                        CROSS JOIN
                                bits
                        WHERE   ii.lngContactFK = i.lngContactFK
                                AND ii.strAddress = i.strAddress
                        GROUP BY
                                b
                        ) q
                ) AS lngNewValue
        FROM    (
                SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
                FROM    tblmInternet ii
                ) i
        WHERE   rn = 1
        )
UPDATE  v
SET     lngValue = lngNewValue;

;WITH    v AS
        (
        SELECT  ii.*, ROW_NUMBER() OVER (PARTITION BY lngContactFK, strAddress ORDER BY lngInternetPK) AS rn
        FROM    tblmInternet ii
        )
DELETE  v
WHERE   rn > 1

See this article in my blog for more detailed explanations:

查看更多
小情绪 Triste *
4楼-- · 2020-02-26 13:27

You can create SQL Server Aggregate functions in .NET that you can then implement in SQL server inline. I think this requires a minimum of SQL server 2005 and Visual Studio 2010. I did one using Visual Studio 2013 Community Edition (free even for commercial use) for use with .NET 2 and SQL Server 2005.

See the MSDN article: https://msdn.microsoft.com/en-us/library/91e6taax(v=vs.90).aspx

First you'll need to enable the CLR feature in SQL server: https://msdn.microsoft.com/en-us/library/ms131048.aspx

sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'clr enabled', 1;
GO
RECONFIGURE;
GO
  1. Create a SQL Server -> SQL Server Database Project
  2. Right-click on the new project and select Properties
  3. Configure the targeted SQL Server version under Project Settings
  4. Configure the targeted CLR language under SQL CLR (such as VB)
  5. Right-click on the new project and select Add -> New Item...
  6. When the dialog pops up, select SQL Server -> SQL CLR VB -> SQL CLR VB Aggregate

Now you can write your bitwise code in VB:

Imports System
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server


<Serializable()> _
<Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.Native)> _
Public Structure AggregateBitwiseOR

    Private CurrentAggregate As SqlTypes.SqlInt32

    Public Sub Init()
        CurrentAggregate = 0
    End Sub

    Public Sub Accumulate(ByVal value As SqlTypes.SqlInt32)
        'Perform Bitwise OR against aggregate memory
        CurrentAggregate = CurrentAggregate OR value
    End Sub

    Public Sub Merge(ByVal value as AggregateBitwiseOR)
        Accumulate(value.Terminate())
    End Sub

    Public Function Terminate() As SqlInt32
        Return CurrentAggregate
    End Function

End Structure

Now deploy it: https://msdn.microsoft.com/en-us/library/dahcx0ww(v=vs.90).aspx

  1. Build the project using the menu bar: Build -> Build ProjectName (if the build fails with error 04018 then download a new version of the data tools @ http://msdn.microsoft.com/en-US/data/hh297027 or by going to the menu bar: Tools -> Extensions And Updates, then under updates select update for Microsoft SQL Server Update For Database Tooling)
  2. Copy your compiled DLL to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn and to C:\
  3. Register the DLL:

    CREATE ASSEMBLY [CLRTools] FROM ‘c:CLRTools.dll’ WITH PERMISSION_SET = SAFE

  4. Create the aggregate in SQL:

    CREATE AGGREGATE [dbo].[AggregateBitwiseOR](@value INT) RETURNS INT EXTERNAL NAME [CLRTools].[CLRTools.AggregateBitwiseOR];

If you get the error "Incorrect syntax near 'EXTERNAL'" then change the database compatibility level using following commands:

For SQL Server 2005: EXEC sp_dbcmptlevel 'DatabaseName', 90

For SQL Server 2008: EXEC sp_dbcmptlevel 'DatabaseName', 100

  1. Test your code:

    SELECT dbo.AggregateBitwiseOR(Foo) AS Foo FROM Bar

I found this article helpful: http://www.codeproject.com/Articles/37377/SQL-Server-CLR-Functions

查看更多
登录 后发表回答