i'm trying to optimize a query that is in production which is taking a long time. The goal is to find duplicate records based on matching field values criteria and then deleting them. The current query uses a self join via inner join on t1.col1 = t2.col1 then a where clause to check the values.
select * from table t1
inner join table t2 on t1.col1 = t2.col1
where t1.col2 = t2.col2 ...
What would be a better way to do this? Or is it all the same based on indexes? Maybe
select * from table t1, table t2
where t1.col1 = t2.col1, t2.col2 = t2.col2 ...
this table has 100m+ rows.
MS SQL, SQL Server 2008 Enterprise
select distinct t2.id
from table1 t1 with (nolock)
inner join table1 t2 with (nolock) on t1.ckid=t2.ckid
left join table2 t3 on t1.cid = t3.cid and t1.typeid = t3.typeid
where
t2.id > @Max_id and
t2.timestamp > t1.timestamp and
t2.rid = 2 and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.cid,-1) = isnull(t2.cid,-1) and
isnull(t1.rid,-1) = isnull(t2.rid,-1)and
isnull(t1.typeid,-1) = isnull(t2.typeid,-1) and
isnull(t1.cktypeid,-1) = isnull(t2.cktypeid,-1) and
isnull(t1.oid,'') = isnull(t2.oid,'') and
isnull(t1.stypeid,-1) = isnull(t2.stypeid,-1)
and (
(
t3.uniqueoid = 1
)
or
(
t3.uniqueoid is null and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.col2,'') = isnull(t2.col2,'') and
isnull(t1.rdid,-1) = isnull(t2.rdid,-1) and
isnull(t1.stid,-1) = isnull(t2.stid,-1) and
isnull(t1.huaid,-1) = isnull(t2.huaid,-1) and
isnull(t1.lpid,-1) = isnull(t2.lpid,-1) and
isnull(t1.col3,-1) = isnull(t2.col3,-1)
)
)