MS SQL — How to Extract Phone Number From Long Inc

I have a task to complete where I have to import around 970,000 users from someone else's database into ours. In the source DB, there is a phone number field that contains multiple phone numbers concatenated into one hideous string.

Here are some data examples:

|Home: 555-555-5555 Office: (555)-555-5555 Work: 5555555555|
|Home: Office: 555\555-5555 Work: 555-555-5555|
|Office: 555-555-5555 Home: (555)555-5555 some Comment here|

The problems that I'm running into is

The order of the numbers is inconsistent
There are some free text comments strewn about
Some of the phone numbers have different formatting.

I would really prefer to do this through SQL if possible and am pretty stumped on doing this in an efficient manner with minimal Manual adjustment.

In my DB, we have separate columns for each phone number type, so I basically need to split those strings into their appropriate columns.

Please tell me if I left anything out.

标签： sql-server parsing concatenation phone-number

2条回答

Melony?

2楼-- · 2019-07-26 09:12

In this code, you will need 3 extra columns to store the new phone numbers

This is the logic in the code

splitting the phone numbers

cutting off text before first 3 last 3 numerics in the split result

deleting the alien characters used in phone number(only those used in sample)

inserting replacement separators '-' at position 7 and 4

grouping up data

updating table

Sample data:

DECLARE @t table
  (phone varchar(500), home varchar(50), work varchar(50), office varchar(50))
INSERT @t(phone) values
('Home: 555-555-5551 Office: (555)-555-5555 Work: 5555555552|'),
('|Home: Office: 555\555-5555 Work: 555-555-5555|'),
('|Office: 555-555-5555 Home: (555)555-5555 some Comment here|')

Update:

;WITH CTE as
(
     SELECT
       nid,work, home, office, 
       t.c.value('.', 'VARCHAR(2000)') phone
     FROM (
         SELECT
           row_number() over(order by (select 1)) nid, work, home,office, 
           x = CAST('<t>' + 
               REPLACE(REPLACE(REPLACE(phone, 'Work', '</t><t>work')
               ,'Office', '</t><t>Office'), 'Home', '</t><t>Home')
                + '</t>' AS XML)
        FROM @t -- replace @t with your table
     ) a
     CROSS APPLY x.nodes('/t') t(c)
     WHERE t.c.value('.', 'VARCHAR(2000)') like '%[0-9][0-9][0-9]%'
), CTE2 as
(
SELECT 
work,max(case when phone like '%work%' then z end) over(partition by nid)nwork,
home,max(case when phone like '%home%' then z end) over(partition by nid)nhome,
office,max(case when phone like '%office%' then z end) over(partition by nid)noffice
FROM cte t
CROSS APPLY(SELECT REVERSE(SUBSTRING(phone,PATINDEX('%[0-9][0-9][0-9]%', phone), 20))x)y
CROSS APPLY(SELECT STUFF(STUFF(REPLACE(REPLACE(REPLACE(REVERSE(
  SUBSTRING(x, PATINDEX('%[0-9][0-9][0-9]%', x), 20)), ')', ''), '\', ''),
  '-', ''),7,0, '-'),4,0,'-')z)v    )
UPDATE CTE2
SET work = nwork, home = nhome, office = noffice

SELECT home,work,office FROM @t

Result:

home          work          office
555-555-5551  555-555-5552  555-555-5555
NULL          555-555-5555  555-555-5555
555-555-5555  NULL          555-555-5555

0人赞添加讨论(0) 举报

Luminary・发光体

3楼-- · 2019-07-26 09:14

Using T-SQL for this task would not be the best choice. The closest solution would be to create a CLR assembly which would utilise the RegEx functionality available in .NET.

However, you may also look into Data Quality Services. It is a SQL Server component which was created for this particular type of task - manual entry cleansing, unification, de-duplication, etc. It requires BI or Enterprise edtion of SQL Server, however.

0人赞添加讨论(0) 举报

MS SQL — How to Extract Phone Number From Long Inc

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间