Delete duplicate records based on multiple columns

2020-06-08 05:31发布

In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.

A duplicate is deemed where any record has the same :legacy_id and :company.

What code can I run to find and delete these duplicates?

I was playing around with this:

Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")

It seemed to return some of the duplicates, but I wasn't sure how to delete from there?

Any ideas?

1条回答
手持菜刀,她持情操
2楼-- · 2020-06-08 06:17

You can try the following approach:

Product.where.not(
  id: Product.group(:legacy_id, :company).pluck('min(products.id)')
).delete_all

Or pure sql:

delete from products
where id not in ( 
   select min(p.id) from products p group by p.legacy_id, p.company
)
查看更多
登录 后发表回答