i have the following sample data with duplicate information:
ID Date Emp_ID Name Keep
---------------------------------------------------------
1 17/11/2010 13:45:22 101 AB *
2 17/11/2010 13:44:10 101 AB
3 17/11/2010 12:45:22 102 SF *
4 17/11/2010 12:44:10 102 SF
5 17/11/2010 11:45:22 103 RD *
6 17/11/2010 11:44:10 103 RD
Based on the above data set, how can I remove the duplicate Emp IDs and only keep the Emp IDs that have the maximum date/time specified?
So based on the above, I would only see IDs: 1, 3 and 5.
Thanks.
Something like:
DELETE FROM the_table_with_no_name
WHERE date_column != (SELECT MAX(t2.date_column)
FROM the_table_with_no_name t2
WHERE t2.id = the_table_with_no_name.id);
You can generate the ROWIDs of all rows other than the one with the maximum date (for a given EMPIds) and delete them. I have found this to be performant as it is a set-based approach and uses analytics, rowIDs.
--get list of all the rows to be deleted.
select row_id from (
select rowid row_id,
row_number() over (partition by emp_id order by date desc) rn
from <table_name>
) where rn <> 1
And then delete the rows.
delete from table_name where rowid in (
select row_id from (
select rowid row_id,
row_number() over (partition by emp_id order by date desc) rn
from <table_name>
) where rn <> 1
);