SQL: How to find duplicates based on two fields?

2020-02-03 10:37发布

I have rows in an Oracle database table which should be unique for a combination of two fields but the unique constrain is not set up on the table so I need to find all rows which violate the constraint myself using SQL. Unfortunately my meager SQL skills aren't up to the task.

My table has three columns which are relevant: entity_id, station_id, and obs_year. For each row the combination of station_id and obs_year should be unique, and I want to find out if there are rows which violate this by flushing them out with an SQL query.

I have tried the following SQL (suggested by this previous question) but it doesn't work for me (I get ORA-00918 column ambiguously defined):

SELECT
entity_id, station_id, obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

Can someone suggest what I'm doing wrong, and/or how to solve this?

9条回答
\"骚年 ilove
2楼-- · 2020-02-03 11:31

Re-write of your query

SELECT
t1.entity_id, t1.station_id, t1.obs_year
FROM
mytable t1
INNER JOIN (
SELECT entity_id, station_id, obs_year FROM mytable 
GROUP BY entity_id, station_id, obs_year HAVING COUNT(*) > 1) dupes 
ON 
t1.station_id = dupes.station_id AND
t1.obs_year = dupes.obs_year

I think the ambiguous column error (ORA-00918) was because you were selecting columns whose names appeared in both the table and the subquery, but you did not specifiy if you wanted it from dupes or from mytable (aliased as t1).

查看更多
干净又极端
3楼-- · 2020-02-03 11:33

I thought a lot of the solutions here were cumbersome and tough to understand since I had a 3 column primary key constraint and needed to find the duplicates. So here's an option

SELECT id, name, value, COUNT(*) FROM db_name.table_name
GROUP BY id, name, value
HAVING COUNT(*) > 1
查看更多
手持菜刀,她持情操
4楼-- · 2020-02-03 11:36
  SELECT entity_id, station_id, obs_year
    FROM mytable
GROUP BY entity_id, station_id, obs_year
HAVING COUNT(*) > 1

Specify the fields to find duplicates on both the SELECT and the GROUP BY.

It works by using GROUP BY to find any rows that match any other rows based on the specified Columns. The HAVING COUNT(*) > 1 says that we are only interested in seeing any rows that occur more than 1 time (and are therefore duplicates)

查看更多
登录 后发表回答