I have the below table with the below records in it
create table employee
(
EmpId number,
EmpName varchar2(10),
EmpSSN varchar2(11)
);
insert into employee values(1, 'Jack', '555-55-5555');
insert into employee values (2, 'Joe', '555-56-5555');
insert into employee values (3, 'Fred', '555-57-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');
insert into employee values (1, 'Jack', '555-55-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6 ,'Lisa', '555-70-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');
I dont have any primary key in this table .But i have the above records in my table already. I want to remove the duplicate records which has the same value in EmpId and EmpSSN fields.
Ex : Emp id 5
Can any one help me to frame a query to delete those duplicate records
Thanks in advance
Use the row number to differentiate between duplicate records. Keep the first row number for an EmpID/EmpSSN and delete the rest:
You could create a temporary table
#tempemployee
containing aselect distinct
of youremployee
table. Thendelete from employee
. Theninsert into employee select from #tempemployee
.Like Josh said - even if you know the duplicates, deleting them will be impossile since you cannot actually refer to a specific record if it is an exact duplicate of another record.
Code
Explanation
Use an inner query to construct a view over the table which includes a field based on
Row_Number()
, partitioned by those columns you wish to be unique.Delete from the results of this inner query, selecting anything which does not have a row number of 1; i.e. the duplicates; not the original.
The
order by
clause of the row_number window function is needed for a valid syntax; you can put any column name here. If you wish to change which of the results is treated as a duplicate (e.g. keep the earliest or most recent, etc), then the column(s) used here do matter; i.e. you want to specify the order such that the record you wish to keep will come first in the result.Having a database table without Primary Key is really and will say extremely BAD PRACTICE...so after you add one (ALTER TABLE)
Run this until you don't see any more duplicated records (that is the purpose of HAVING COUNT)
MAX([Id]) will cause to delete latest records (ones added after first created) in case you want the opposite meaning that in case of requiring deleting first records and leave the last record inserted please use MIN([Id])
Using this we can remove duplicate records
ITS easy use below query