How can you find the rows with equal columns?

2019-06-24 14:17发布

If I have a table with important 2 columns,

CREATE TABLE foo (id INT, a INT, b INT, KEY a, KEY b);

How can I find all the rows that have both a and b being the same in both rows? For example, in this data set

id | a | b
----------
1  | 1 | 2
2  | 5 | 42
3  | 1 | 42
4  | 1 | 2 
5  | 1 | 2
6  | 1 | 42

I want to get back all rows except for id=2 since it is unique in (a,b). Basically, I want to find all offending rows that would stop a

ALTER TABLE foo ADD UNIQUE (a, b);

Something better than an n^2 for loop would be nice since my table has 10M rows.

For bonus points : How do I removed all but one of the rows (I don't care which ones, as long as one is left)

8条回答
贼婆χ
2楼-- · 2019-06-24 14:28
select * from foo where a = b

Or am I missing something?

===

Update for clarity:

select * from 
foo as a
inner join foo as b
on a.a = b.a AND b.a = b.b
and a.id != b.id

++++++++++ After 3rd clarity edit:

select f1.id
FROM foo as f1
INNER JOIN foo as f2
ON f1.a = f2.a AND f1.b=f2.b AND f1.id != f2.id

But I'm shot, so check it yourself.

查看更多
狗以群分
3楼-- · 2019-06-24 14:32
SELECT * 
FROM foo first
JOIN foo second
  ON ( first.a = second.a
       AND first.b = second.b ) 
  AND (first.id <> second.id )

Should come up with all the rows where more that one row has the same combination of a and b.

Just hope you have an index on columns a and b.

查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-06-24 14:32

shouldn't this work?

SELECT * FROM foo WHERE a = b

=== edit ===

the how about

SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1

=== final re-edit before i give up on this question ===

SELECT foo.* FROM foo, (
   SELECT a, b FROM foo GROUP BY a, b HAVING COUNT(*) > 1
) foo2
WHERE foo.a = foo2.a AND foo.b = foo2.b
查看更多
Luminary・发光体
5楼-- · 2019-06-24 14:36

Try this:

    With s as (Select a,b from foo group by a,b having Count(1)>1)
Select foo.* from foo,s where foo.a=s.a and foo.b=s.b

This query should show duplicate rows in the table foo.

查看更多
对你真心纯属浪费
6楼-- · 2019-06-24 14:38

Your stated goal is to remove all duplicate combination of (a,b). For that, you can use a multi-table DELETE:

DELETE t1
  FROM foo t1
  JOIN foo t2 USING (a, b)
 WHERE t2.id > t1.id

Before you run it, you can check which rows will be removed with:

SELECT DISTINCT t1.id
  FROM foo t1
  JOIN foo t2 USING (a, b)
 WHERE t2.id > t1.id

The WHERE clause being t2.id > t1.id it will remove all but the one with the highest value for id. In your case, only the rows with id equal to 2, 5 or 6 would remain.

查看更多
放我归山
7楼-- · 2019-06-24 14:40

here's another approach

select * from foo f1 where exists(
  select * from foo f2 where
    f1.id != f2.id and
    f1.a = f2.a and
    f1.b = f2.b )

anyway, even though I find it a bit more readable, if you have such a huge table, you should check the execution plan, subqueries have a bad reputation involving performance...

you should also consider creating the index (without the unique clause, obviously) to speed up the query... for huge operations, sometimes it's better to spend the time creating the index, perform the update and then drop the index... in this case, I guess an index on (a, b) should certainly help a lot...

查看更多
登录 后发表回答