Compare each records of same table in sql server a

2020-04-12 03:39发布

问题:

I have table like below.I wanted to get the duplicate records.Here the condition if date2 and date4 having same date OR dates within less than or equals to 10 days of each other then records are duplicate. I have around 2000 records in the DB.Showing here few sample example. Date1 can be ignored. It may be same date or may be different.

ID Number   Code Type   date1     Date2       date3        Date4     status   shortname     CP      deferred
1  EO2      C    TO   9/20/2000  9/1/2010   9/18/2010   9/1/2010     Archi    Ordinary  58.65586    0
2  EO2      C    TO   9/20/2000  9/5/2010   9/18/2010   9/5/2010     Archi    Ordinary  58.65586    0
3  EO2      C    TO   9/21/2000  9/10/2010  9/18/2010   9/10/2010    Archi    Ordinary  58.65586    0
4  EO2      C    TO   9/21/2000  9/24/2010  9/18/2010   9/24/2010    Archi    Ordinary  58.65586    0

I have written below query:

select * from T a
join T b on a.ID = b.ID
where a.[Number] = b.[Number] and a.ID >1

Also, I tied this:

SELECT * FROM T a WHERE a.Id IN (SELECT b.Id FROM T b)
EXCEPT
SELECT * FROM T a

The problem is I'm not able to find a way, where each row can be compared with each other with above date condition.I should get the result like below as duplicate:

Number  Code Type   date1     Date2       date3        Date4     status   shortname     CP      deferred
EO2     C    TO   9/20/2000  9/1/2010   9/18/2010   9/1/2010     Archi    Ordinary  58.65586    0
EO2     C    TO   9/20/2000  9/5/2010   9/18/2010   9/5/2010     Archi    Ordinary  58.65586    0
EO2     C    TO   9/21/2000  9/10/2010  9/18/2010   9/10/2010    Archi    Ordinary  58.65586    0

Please help.Thanks.

回答1:

You can use a join to search for other identical rows. A join tries to match all rows from the right hand table based on the on condition. For example:

select  *
from    YourTable t1
join    YourTable t2
on      t1.ID < t2.ID -- Must be different rows
                      -- Smaller than presents the duplicates once
                      -- Otherwise you'd get both 1,3 and 3,1
        and abs(datediff(day, t1.date2, t2.date2)) <= 10
        and abs(datediff(day, t1.date4, t2.date4)) <= 10
        and t1.Number = t2.Number
        and t1.Code = t2.Code
        and -- So on for every column that should be equal

To ignore a column, omit it from the on condition.



回答2:

This is how I solved this. Thanks for help.

select t1.Number,t1.Code,t1.Type,t1.date1,t1.Date2,t1.date3,t1.Date4,t1.stats,t1.shortn‌​ame,t1.CP, t1.deferred 
from T t1 join T t2 on 1=1 and t2.Number = t1.Number and t2.Code = t1.Code and t2.Type = t1.Type and t2.deferred = t1.eferred and t2.CP = t1.CP and t2.status = t1.status and abs(datediff(day,t1.Date2,t2.Date2)) <=10 and abs(datediff(day,t1.date3,t2.date3)) <=10 and abs(datediff(day,t1.Date4,t2.Date4)) <=10 group by t1.Number, t1.Code,t1.Type , t1.date1, t1.Date2,t1.date3,t1.Date4 ,t1.status,t1.shortname,t1.CP, t1.deferred having count(*) > 1