Phone Number Regular [removed]Regex) in Python

2019-02-27 11:27发布

Dive into python gives an amazing little tutorial on creating a regular expression for phone numbers: http://diveintopython3.ep.io/regular-expressions.html#phonenumbers

The final version comes out to look like:

phone_re = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$', re.VERBOSE)

This works fine for almost all examples I can come up with, however I found a pretty big failure that I can't seem to fix.

If a group of 3 digits comes before the phone number it works fine. IE: "500 dollars off, call 123-456-7891"

If a group of 3 digits comes after the phone number it fails. IE: "Call 123-456-7891 for a discount of up to 500"

Any ideas on a fix that would work for both examples?

2条回答
看我几分像从前
2楼-- · 2019-02-27 12:20

Here's your original, with some spaces (use re.VERBOSE, or remove the spaces):

(\d{3}) \D* (\d{3}) \D* (\d{4}) \D* (\d*)

The \D* will match anything that's not a digit, including words. Maybe you should try this:

(\d{3}) \W* (\d{3}) \W* (\d{4}) \W* (\d*)

The \W* matches anything that's not a word. It will match (222) - 222 - 2222. However, it will not match if there is a letter between the numbers, as in (222) x 222 - 2222. The last part of the match (\d*) appears to be looking for an extension. These can be formatted in a variety of ways—I suggest you either drop it or refine it based on how you expect your data to look. And, like Amber says, you should probably drop the $.

查看更多
劫难
3楼-- · 2019-02-27 12:24

The (\d*)$ requires that the string you're matching against end with digit characters (the $ signifies "end of line"). Try removing the $ if you're matching against a larger string where the phone number may not be at the end of the line.

查看更多
登录 后发表回答