What's the best way to dedupe a table?-第3页回答

What's the best way to dedupe a table?

2019-01-21 19:33发布

I've seen a couple of solutions for this, but I'm wondering what the best and most efficient way is to de-dupe a table. You can use code (SQL, etc.) to illustrate your point, but I'm just looking for basic algorithms. I assumed there would already be a question about this on SO, but I wasn't able to find one, so if it already exists just give me a heads up.

(Just to clarify - I'm referring to getting rid of duplicates in a table that has an incremental automatic PK and has some rows that are duplicates in everything but the PK field.)

标签： sql algorithm performance duplicates

14条回答

神经病院院长

2楼-- · 2019-01-21 20:17

SELECT DISTINCT <insert all columns but the PK here> FROM foo. Create a temp table using that query (the syntax varies by RDBMS but there's typically a SELECT … INTO or CREATE TABLE AS pattern available), then blow away the old table and pump the data from the temp table back into it.

0人赞添加讨论(0) 举报

倾城　Initia

3楼-- · 2019-01-21 20:17

You could generate a hash for each row (excluding the PK), store it in a new column (or if you can't add new columns, can you move the table to a temp staging area?), and then look for all other rows with the same hash. Of course, you would have to be able to ensure that your hash function doesn't produce the same code for different rows.

If two rows are duplicate, does it matter which you get rid of? Is it possible that other data are dependent on both of the duplicates? If so, you will have to go through a few steps:

Find the dupes
Choose one of them as dupeA to eliminate
Find all data dependent on dupeA
Alter that data to refer to dupeB
delete dupeA.

This could be easy or complicated, depending on your existing data model.

This whole scenario sounds like a maintenance and redesign project. If so, best of luck!!

0人赞添加讨论(0) 举报

上一页 1 2 3

What's the best way to dedupe a table?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间