Dive into python gives an amazing little tutorial on creating a regular expression for phone numbers: http://diveintopython3.ep.io/regular-expressions.html#phonenumbers
The final version comes out to look like:
phone_re = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$', re.VERBOSE)
This works fine for almost all examples I can come up with, however I found a pretty big failure that I can't seem to fix.
If a group of 3 digits comes before the phone number it works fine. IE: "500 dollars off, call 123-456-7891"
If a group of 3 digits comes after the phone number it fails. IE: "Call 123-456-7891 for a discount of up to 500"
Any ideas on a fix that would work for both examples?
Here's your original, with some spaces (use
re.VERBOSE
, or remove the spaces):The
\D*
will match anything that's not a digit, including words. Maybe you should try this:The
\W*
matches anything that's not a word. It will match(222) - 222 - 2222
. However, it will not match if there is a letter between the numbers, as in(222) x 222 - 2222
. The last part of the match(\d*)
appears to be looking for an extension. These can be formatted in a variety of ways—I suggest you either drop it or refine it based on how you expect your data to look. And, like Amber says, you should probably drop the$
.The
(\d*)$
requires that the string you're matching against end with digit characters (the$
signifies "end of line"). Try removing the$
if you're matching against a larger string where the phone number may not be at the end of the line.