Regex for NANP phone number with non-repeating las

2019-09-16 08:21发布

问题:

I have to validate a 10 digit (US) phone number in the NANP format (no special characters allowed) in .NET and also check to make sure the last 7 digits of the phone number are non-repeating. So far, I have written the following regex to validate the NANP format

^(?:[2-9][0-8][0-9])([2-9][0-9]{2}[0-9]{4})$

How do I modify this regex to also account for non-repeating last 7 digits? Please note that using two regexes is not an option due to constraints of existing code.

Edit: I have to check for consecutive duplicates in all 7 digits. For e.g. 2062222222 should be considered invalid whereas 2062221234 or 2062117777 should be considered valid.

Thanks

回答1:

Are you talking about consecutive repeating digits, or do all seven digits have to be unique? For example:

2342497553  // consecutive duplicates
2345816245  // non-consecutive duplicates
2345816249  // no duplicates

This regex filters out consecutive duplicates:

^(?:[2-9][0-8][0-9])(?!.*(\d)\1)([2-9][0-9]{2}[0-9]{4})$

...while this one disallows any duplicate digits:

^(?:[2-9][0-8][0-9])(?!.*(\d).*\1)([2-9][0-9]{2}[0-9]{4})$

After the first three digits have been consumed, the lookahead tries to find a character that's repeated, either immediately ((?!.*(.)\1)) or with optional intervening characters ((?!.*(.).*\1)). And it's a negative lookahead, so if it succeeds, the overall match fails.


EDIT: It turns out the problem is simpler than I thought. To filter out numbers like 2345555555, where the last seven digits are identical, use this:

^(?:[2-9][0-8][0-9])(?!(\d)\1+$)([2-9][0-9]{2}[0-9]{4})$

It's important to include the end anchor ($), because without that it would fail to match valid numbers like 2345555556. Alternatively, you could tell it to look for exactly six more of the captured digit: (?!(\d)\1{6}).



回答2:

I'm pretty sure the non-repeating portion of this came up last night and the general consensus was that regular expressions can't handle non-repetition directly, you'd have to put in an unmanageably large number of alternative cases. I don't think I've actually seen it proven, but I'm pretty sure it's true. It boils down to the fact that regular expressions have no memory. I suggest you use the regexp to validate the format and run it through a separate algorithm to check for repetition.