Finding duplicate values in a SQL table

It's easy to find duplicates with one field:

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table

ID   NAME   EMAIL
1    John   asd@asd.com
2    Sam    asd@asd.com
3    Tom    asd@asd.com
4    Bob    bob@asd.com
5    Tom    asd@asd.com

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed to insert duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

标签： sql duplicates

25条回答

妖精总统

2楼-- · 2018-12-31 03:15

This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.

Select duplicates:

SELECT *
FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

Delete duplicates:

DELETE FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

Be aware of larger amounts of records, it can cause performance problems.

0人赞添加讨论(0) 举报

梦醉为红颜

3楼-- · 2018-12-31 03:16

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

Recent PostgreSQL supports it.
SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
MySQL is unpredictable and you need sql_mode=only_full_group_by:
- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).

0人赞添加讨论(0) 举报

零度萤火

4楼-- · 2018-12-31 03:17

Try the following:

SELECT * FROM
(
    SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
        AS Rank 
        FROM Customers
) AS B WHERE Rank>1

0人赞添加讨论(0) 举报

旧人旧事旧时光

5楼-- · 2018-12-31 03:17

This should also work, maybe give it try.

  Select * from Users a
            where EXISTS (Select * from Users b 
                where (     a.name = b.name 
                        OR  a.email = b.email)
                     and a.ID != b.id)

Especially good in your case If you search for duplicates who have some kind of prefix or general change like e.g. new domain in mail. then you can use replace() at these columns

0人赞添加讨论(0) 举报

初与友歌

6楼-- · 2018-12-31 03:18

SELECT
  FirstName, LastName, MobileNo, COUNT(1) as CNT 
FROM        
  CUSTOMER
GROUP BY
  FirstName, LastName, MobileNo 
HAVING
  COUNT(1) > 1;

0人赞添加讨论(0) 举报

余欢

7楼-- · 2018-12-31 03:21

This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).

This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.

with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName

0人赞添加讨论(0) 举报

1 2 3 4 5 下一页

Finding duplicate values in a SQL table

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间