Fastest “Get Duplicates” SQL script

What is an example of a fast SQL to get duplicates in datasets with hundreds of thousands of records. I typically use something like:

SELECT afield1, afield2 FROM afile a 
WHERE 1 < (SELECT count(afield1) FROM afile b WHERE a.afield1 = b.afield1);

But this is quite slow.

标签： sql scripting duplicates performance

5条回答

混吃等死

2楼-- · 2019-01-30 02:44

You could try:

select afield1, afield2 from afile a
where afield1 in
( select afield1
  from afile
  group by afield1
  having count(*) > 1
);

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2019-01-30 02:44

By the way, if anyone wants to remove the duplicates, I have used this:

delete from MyTable where MyTableID in (
  select max(MyTableID)
  from MyTable
  group by Thing1, Thing2, Thing3
  having count(*) > 1
)

0人赞添加讨论(0) 举报

唯我独甜

4楼-- · 2019-01-30 02:56

This is the more direct way:

select afield1,count(afield1) from atable 
group by afield1 having count(afield1) > 1

0人赞添加讨论(0) 举报

该账号已被封号

5楼-- · 2019-01-30 02:58

A similar question was asked last week. There are some good answers there.

SQL to find duplicate entries (within a group)

In that question, the OP was interested in all the columns (fields) in the table (file), but rows belonged in the same group if they had the same key value (afield1).

There are three kinds of answers:

subqueries in the where clause, like some of the other answers in here.

an inner join between the table and the groups viewed as a table (my answer)

and analytic queries (something that's new to me).

0人赞添加讨论(0) 举报

迷人小祖宗

6楼-- · 2019-01-30 03:03

This should be reasonably fast (even faster if the dupeFields are indexed).

SELECT DISTINCT a.id, a.dupeField1, a.dupeField2
FROM TableX a
JOIN TableX b
ON a.dupeField1 = b.dupeField2
AND a.dupeField2 = b.dupeField2
AND a.id != b.id

I guess the only downside to this query is that because you're not doing a COUNT(*) you can't check for the number of times it is duplicated, only that it appears more than once.

0人赞添加讨论(0) 举报

Fastest “Get Duplicates” SQL script

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间