I need a well tested Regular Expression (.net style preferred), or some other simple bit of code that will parse a USA/CA phone number into component parts, so:
- 3035551234122
- 1-303-555-1234x122
- (303)555-1234-122
- 1 (303) 555 -1234-122
etc...
all parse into:
- AreaCode: 303
- Exchange: 555
- Suffix: 1234
- Extension: 122
None of the answers given so far was robust enough for me, so I continued looking for something better, and I found it:
Google's library for dealing with phone numbers
I hope it is also useful for you.
This is the one I use:
^(?:(?:[\+]?(?<CountryCode>[\d]{1,3}(?:[ ]+|[\-.])))?[(]?(?<AreaCode>[\d]{3})[\-/)]?(?:[ ]+)?)?(?<Number>[a-zA-Z2-9][a-zA-Z0-9 \-.]{6,})(?:(?:[ ]+|[xX]|(i:ext[\.]?)){1,2}(?<Ext>[\d]{1,5}))?$
I got it from RegexLib I believe.
This regex works exactly as you want with your examples:
Regex regexObj = new Regex(@"\(?(?<AreaCode>[0-9]{3})\)?[-. ]?(?<Exchange>[0-9]{3})[-. ]*?(?<Suffix>[0-9]{4})[-. x]?(?<Extension>[0-9]{3})");
Match matchResult = regexObj.Match("1 (303) 555 -1234-122");
// Now you have the results in groups
matchResult.Groups["AreaCode"];
matchResult.Groups["Exchange"];
matchResult.Groups["Suffix"];
matchResult.Groups["Extension"];
Strip out anything that's not a digit first. Then all your examples reduce to:
/^1?(\d{3})(\d{3})(\d{4})(\d*)$/
To support all country codes is a little more complicated, but the same general rule applies.
Here is a well-written library used with GeoIP for instance:
http://highway.to/geoip/numberparser.inc
here's a method easier on the eyes provided by the Z Directory (vettrasoft.com),
geared towards American phone numbers:
string_o s2, s1 = "888/872.7676";
z_fix_phone_number (s1, s2);
cout << s2.print(); // prints "+1 (888) 872-7676"
phone_number_o pho = s2;
pho.store_save();
the last line stores the number to database table "phone_number".
column values: country_code = "1", area_code = "888", exchange = "872",
etc.