How to select sequential duplicates in SQL Server

2019-07-20 15:05发布

问题:

I would like to select duplicate entries from a SQL Server table, but only if the id is consecutive.

I have been trying to twist this answer to my needs, but I can't get it to work.

The above answer is for Oracle, but I see that SQL Server also has lead and lag functions.

Also, I think that the answer above puts a * next to duplicates, but I only want to select the duplicates.

select 
    id, companyName, 
    case 
       when companyName in (prev, next) 
          then '*' 
    end match, 
    prev, 
    next 
from 
    (select
         id,
         companyName,
         lag(companyName, 1) over (order by id) prev,
         lead(companyName, 1) over (order by id) next
     from
         companies)
order by
    id;

Example:

So from this data set:

id      companyName
-------------------    
1       dogs ltd
2       cats ltd
3       pigs ltd
4       pigs ltd
5       cats ltd
6       cats ltd
7       dogs ltd
8       pigs ltd

I want to select:

id      companyName
-------------------    
3       pigs ltd
4       pigs ltd
5       cats ltd
6       cats ltd

Update

Every now and again I am taken aback by the quantity and quality of answers I get on SO. This is one of those times. I don't have the level of expertise to judge one answer as being better than another, so I've gone for SqlZim as this was the first working answer I saw. But it's great to see the different approaches. Especially when only an hour ago I was wondering "is this even possible?".

回答1:

This is a gaps and islands style problem, but instead of using two row_numbers(), we use the id and row_number() in the innermost subquery. Followed by count() over() to get the count per grp, and finally return those with a cnt > 1.

select id, companyname 
from (
  select 
      id
    , companyName
    , grp
    , cnt = count(*) over (partition by companyname, grp)
  from (
    select *
      , grp = id - row_number() over (partition by companyname order by id)
    from
      companies
    ) islands
  ) d
where cnt  > 1
order by id

rextester demo: http://rextester.com/ACP73683

returns:

+----+-------------+
| id | companyname |
+----+-------------+
|  3 | pigs ltd    |
|  4 | pigs ltd    |
|  5 | cats ltd    |
|  6 | cats ltd    |
+----+-------------+


回答2:

You are very close to what you want:

select id, companyName
from (select c.*,
             lag(companyName, 1) over (order by id) prev,
             lead(companyName, 1) over (order by id) next
      from companies c
     ) a
where CompanyName in (prev, next)
order by id;


回答3:

One more alternate form, using LEAD() and LAG() (SQL 2012 and up)

SELECT id, CompanyName
FROM (
    SELECT *,
        LEAD(CompanyName, 1) OVER(ORDER BY id) as nc,
        LAG(CompanyName, 1) OVER(ORDER BY id) AS pc
    FROM #t t
    ) x
WHERE nc = companyName
    OR pc = companyName

Here is the test data, so you can check it out yourself.

CREATE TABLE #T (id int not null PRIMARY KEY, companyName varchar(16) not null)

INSERT INTO #t Values 
(1,       'dogs ltd'),
(2,       'cats ltd'),
(3,       'pigs ltd'),
(4,       'pigs ltd'),
(5,       'cats ltd'),
(6,       'cats ltd'),
(7,       'dogs ltd'),
(8,       'pigs ltd')


回答4:

In the WHERE clause you just need to limit to those where the companyName is the same as the prev or the next

select id, companyName
from (
   select id, companyName,
   lag(companyName, 1) over (order by id) as prev,
   lead(companyName, 1) over (order by id) as next
   from companies
 ) q
 where companyName in (prev, next)
 order by id;

To make sure the id's are really without gaps then you can do it like this:

select id, companyName
from (
   select id, companyName,
   lag(concat(id+1,companyName), 1) over (order by id) as prev,
   lead(concat(id-1,companyName), 1) over (order by id) as next
   from companies
 ) q
 where concat(id,companyName) in (prev, next)
 order by id;


回答5:

You can use Row_Number() and get the duplicates based on partition by clause

;with cte as (
SELECT id, companyName,
    RowN = Row_Number() over (partition by id order by companynae) from #yourTable
    )
    Select * from cte where RowN > 1

Can you provide your input and expected output to verify this query