I need to query a table for duplicate deposit records, where two deposits made at one cash terminal, for the same amount, within a certain time window, are considered duplicate records. I've started working on a query now, but I would appreciate any advice or suggestions on doing this 'properly'.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Generally, you'd do a self join to the same table, and put your "duplicate" criteria in the join conditions.
E.g.
SELECT
*
FROM
Transactions t1
inner join
Transactions t2
on
t1.Terminal = t2.Terminal and
t1.Amount = t2.Amount and
DATEDIFF(minute,t2.TransactionDate,t1.TransactionDate) between 0 and 10 and
t1.TransactionID > t2.TransactionID /* prevent matching the same row */
回答2:
Simple aggregate
SELECT
col1, col2, col3, ...
FROM
MyTable
GROUP BY
col1, col2, col3, ...
HAVING
COUNT(*) >= 2
Don't include your identity/key/PK column: this will be unique per row and mess up the aggregate.
To get a row to remove or keep, do a MAX or MIN on that
SELECT
col1, col2, col3, ...,
MAX(IDCol) AS RowToDelete,
MIN(IDCol) AS RowToKeep
FROM
MyTable
GROUP BY
col1, col2, col3, ...
HAVING
COUNT(*) >= 2
Of course, with 3 duplicates then do a "keep".
Edit:
For rows within a time window, use a self join or window/ranking function