DELETE all duplicate topics with few conditions

2019-02-14 16:07发布

I'm trying to make sql who will delete all duplicate titles BUT must delete duplicates with these conditions:

  • must delete only duplicates with same object_id
  • must keep only the newest record (biggest topic_id) (topic_id is the unique id for every topic AI)

So far I've done that (testing with select...)

SELECT topic_id,object_id,title,url,date 
FROM topics GROUP BY title 
HAVING ( COUNT(title) > 1) 
ORDER BY topic_id DESC

But doesn't meet the conditions.
I'm using mysql.

4条回答
我只想做你的唯一
2楼-- · 2019-02-14 16:31

In MySQL, you cannot specify the target table to a DML operation in a subquery (unless you nest it more than one level deep, but in this case you won't get reliable results and cannot use correlated subqueries).

Use a JOIN:

DELETE  td
FROM    topics td
JOIN    topics ti
ON      ti.object_id = td.object_id
        AND ti.title = td.title
        AND ti.topic_id > td.topic_id;

Create an index on topics (object_id, title, topic_id) for this to work fast.

查看更多
叛逆
3楼-- · 2019-02-14 16:41

This will delete all duplicate object_id records save the one with the highest topic_id.

delete from topics outer where exists
(
    select 1 from topics inner where 
        outer.object_id = inner.object_id AND 
        inner.topic_id  < outer.topic_id
)
查看更多
Root(大扎)
4楼-- · 2019-02-14 16:48

WITH tbl AS (SELECT topic_id, object_id, row_number() over(partition by object_id order by topic_id DESC) as rnum
FROM topics) DELETE tbl WHERE rnum > 1

For more information please check this article: http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/

查看更多
叛逆
5楼-- · 2019-02-14 16:49

First, if you have a date field, you'd do better identifying the newest records by their date.

This will work :

SELECT topic_id, object_id, title, url, date 
FROM   topics earlier
WHERE  EXISTS 
    (SELECT newest.topic_id 
     FROM   topics newest 
     WHERE  newest.date      > earlier.date 
     AND    newest.object_id = earlier.object_id)

You're selecting the lines for which exists another line with the same object_id and a more recent date.

查看更多
登录 后发表回答