Find duplicate rows with PostgreSQL

2019-01-10 05:17发布

We have a table of photos with the following columns:

id, merchant_id, url 

this table contains duplicate values for the combination merchant_id, url. so it's possible that one row appears more several times.

234 some_merchant  http://www.some-image-url.com/abscde1213
235 some_merchant  http://www.some-image-url.com/abscde1213
236 some_merchant  http://www.some-image-url.com/abscde1213

What is the best way to delete those duplications? (I use PostgreSQL 9.2 and Rails 3.)

3条回答
冷血范
2楼-- · 2019-01-10 05:57

I see a couple of options for you.

For a quick way of doing it, use something like this (it assumes your ID column is not unique as you mention 234 multiple times above):

CREATE TABLE tmpPhotos AS SELECT DISTINCT * FROM Photos;
DROP TABLE Photos;
ALTER TABLE tmpPhotos RENAME TO Photos;

Here is the SQL Fiddle.

You will need to add your constraints back to the table if you have any.

If your ID column is unique, you could do something like to keep your lowest id:

DELETE FROM P1  
USING Photos P1, Photos P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  

And the Fiddle.

查看更多
仙女界的扛把子
3楼-- · 2019-01-10 06:04

Here is my take on it.

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
  FROM Photos
) dups
where 
dups.Row > 1

Feel free to play with the order by to tailor the records you want to delete to your specification.

SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0


SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-01-10 06:11

The second part of sgeddes's answer doesn't work on Postgres (the fiddle uses MySQL). Here is an updated version of his answer using Postgres: http://sqlfiddle.com/#!12/6b1a7/1

DELETE FROM Photos AS P1  
USING Photos AS P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  
查看更多
登录 后发表回答