How can I clear out duplicate entries in an archiv

2019-08-08 06:47发布

问题:

Some background info: I am working on updating an application for a company, and we have a gridview with several columns in it, which vary from labels, to textfields, to dropdowns. Each row in the gridview is for a different 'Entry' which resides in my 'Entry' Table. When a user changes one of these columns (updates the value selected in the dropdown, changes text in the textfield, etc) they WERE clicking an 'update button' which then inserted the change that was made into our 'EntryLog' table. This is how we archive all the changes made to a particular entry. The issue is, the old implementation of the update process would create an archive record for ALL of the entries in the gridview for that page, even when the user only actually updated ONE entry. This means that when a user updated one entry, our 'EntryLog' table was getting flooded with additional archive records that in reality were identical to the previous one (meaning nothing was actually changed for that entry)

Onto the question: I have already solved the issue by implementing functionality that only creates an archived record for the single row in the gridview that was modified, but now I am left with the task of cleaning out our 'EntryLog' table of all of those falsely archived records. If I had to phrase what I am trying to do in English I would say: I want to delete all of the Logs (by LogID [unique ID for each archive entry] ) where each column value is the same as the previous Log column values and have the same 'EntryID' (unique ID for each ENTRY itself, meaning there are many instances of each EntryID in the 'EntryLog' table. )

I am unsure how to even begin to safely phrase this in T-SQL and any help or nudge in the right direction would be greatly appreciated.

回答1:

Use the CHECKSUM() function to see if the data is identical, without checking every single column.

Assume you have something like this:

create table #t (log_id int, c varchar(10), d int, log_date date)

insert #t values
(1, 'aaaaa', 1, '20140101'),
(1, 'aaaaa', 1, '20140102'),
(1, 'aaaaa', 1, '20140103'),
(1, 'bbbbb', 1, '20140104'),
(2, 'ccc', 10, '20140103'),
(2, 'cdd', 10, '20140105')

The data in the second and the third row are duplicates. To clean the data, execute this statement:

with x as (
select *, row_number() over(partition by log_id, checksum(log_id, c, d) order by log_date) as rn
from #t
)
delete x where rn > 1


回答2:

Since all you want to do is delete duplicate records, if I don't mis understand, and from what I read they are all unique by ID you could use a delete query to delete by the row ID something like this:

DELETE FROM <TableName>
WHERE <ID>= <value>;

Since you did specify that it created quite a few duplicates at once you might want to take a look into parameters using a specified range to delete them a bit quicker. That would look something like this:

DELETE FROM <TableName>
WHERE <ID> BETWEEN <value1> AND <value2>;