MySQL cleanup table from duplicated entries AND re

Here is my situation: I have 2 tables, patient and study.

Each table has its own PK using autoincrement.

In my case, the pat_id should be unique. It's not declared as unique at database level since it could be non unique is some uses (it's not a home made system). I found out how to configure the system to consider the pat_id as unique, but I need now to cleanup the database for duplicated patients AND relink duplicated patients in study table to remaining unique patient, before deleting the duplicated patients.

Patient table:

CREATE TABLE `patient` (
  `pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `pat_id` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
  `pat_name` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL,
...
  `pat_custom1` VARCHAR(250) COLLATE latin1_bin DEFAULT NULL
....
  PRIMARY KEY (`pk`)
)ENGINE=InnoDB;

Study table:

CREATE TABLE `study` (
  `pk` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `patient_fk` BIGINT(20) DEFAULT NULL,
...
  PRIMARY KEY (`pk`),
...
  CONSTRAINT `patient_fk` FOREIGN KEY (`patient_fk`) REFERENCES `patient` (`pk`)
)ENGINE=InnoDB;

I found some similar questions, but not exactly the same issue, especially it was missing the link of the foreign keys to the remaining unique patient.

Cleanup Update for Duplicate Entries

Update only first record from duplicate entries in MySQL

标签： mysql foreign-keys duplicates duplicate-removal

1条回答

在下西门庆

2楼-- · 2019-03-02 02:22

This is how I did.

I reused an unused field in patient table to mark non duplicated (N), 1st of duplicated (X), and other duplicated patients (Y). You could also add a column for this (and drop it after use).

Here are the steps I followed to cleanup my database:

/*1: List duplicated */
select pk,pat_id, t.`pat_id_issuer`, t.`pat_name`, t.pat_custom1
from patient t
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient 
group by 1
having count(*)>1
) xxx);    

/*2: Delete orphan patients */
delete from patient where pk not in (select patient_fk from study);

/*3: Reset flag for duplicated (or not) patients*/
update patient t set t.`pat_custom1`='N';

/*4: Mark all duplicated */
update patient t set t.`pat_custom1`='Y' 
where pat_id in (
select pat_id from (
select pat_id, count(*)
from patient 
group by 1
having count(*)>1
) xxx) ;

/*5: Unmark the 1st of the duplicated*/
update patient t 
join (select pk from (
select min(pk) as pk, pat_id from patient 
where  pat_custom1='Y'  
group by pat_id
) xxx ) x
on (x.pk=t.pk)
set t.`pat_custom1`='X' 
where  pat_custom1='Y'
  ;

/*6: Verify update is correct*/
select pk, pat_id,pat_custom1  
from `patient` 
where  pat_custom1!='N'
order by pat_id, pat_custom1;

/*7: Verify studies linked to duplicated patient */
select p.* from study s
join patient p on (p.pk=s.patient_fk)
where p.pat_custom1='Y';

/*8: Relink duplicated patients */
update study s
join patient p on (p.pk=s.patient_fk)
set patient_fk = (select pk from patient pp
where pp.pat_id=p.pat_id and pp.pat_custom1='X')
where p.pat_custom1='Y';

/*9: Delete newly orphan patients */
delete from patient where pk not in (select patient_fk from study);

/* 10: reset flag */
update patient t set t.`pat_custom1`=null;

/* 11: Commit changes */
commit;

There is certainly a shorter way, with a some smarter (complicated?) SQL, but I personally prefer the simple way. This also allows me to check each step is doing what I expect.

0人赞添加讨论(0) 举报

MySQL cleanup table from duplicated entries AND re

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间