I have a task to complete where I have to import around 970,000 users from someone else's database into ours. In the source DB, there is a phone number field that contains multiple phone numbers concatenated into one hideous string.
Here are some data examples:
|Home: 555-555-5555 Office: (555)-555-5555 Work: 5555555555|
|Home: Office: 555\555-5555 Work: 555-555-5555|
|Office: 555-555-5555 Home: (555)555-5555 some Comment here|
The problems that I'm running into is
- The order of the numbers is inconsistent
- There are some free text comments strewn about
- Some of the phone numbers have different formatting.
I would really prefer to do this through SQL if possible and am pretty stumped on doing this in an efficient manner with minimal Manual adjustment.
In my DB, we have separate columns for each phone number type, so I basically need to split those strings into their appropriate columns.
Please tell me if I left anything out.
In this code, you will need 3 extra columns to store the new phone numbers
This is the logic in the code
Sample data:
Update:
Result:
Using T-SQL for this task would not be the best choice. The closest solution would be to create a CLR assembly which would utilise the RegEx functionality available in .NET.
However, you may also look into Data Quality Services. It is a SQL Server component which was created for this particular type of task - manual entry cleansing, unification, de-duplication, etc. It requires BI or Enterprise edtion of SQL Server, however.