I am performing data clean up and one of my tasks is to delete similar duplicate contacts.
EXAMPLE:
BILL CROSBIE, BILL CROSBY, BILL CROSSBY; or KRISTEN HARRIS, KRISTIN HARIS.
So, there is no exact rule, but by manually scanning this, I can tell that they are very similar and must be duplicates.
Can anyone, provide an example of how I can do this using SSIS.
I understand that I can use the fuzzy lookup, but it requires a reference table or a reference data that is correct and would then compare to the table that needs data cleanup. However, is there a possibility that I can use the script component tool in SSIS to use an alogirthm that gets the characters with most matches. What would that C# code look like?
I am new to using SSIS and don't have much experience. Or is there some sort of script I can create in MSSQL that can do this?