I have to validate a 10 digit (US) phone number in the NANP format (no special characters allowed) in .NET and also check to make sure the last 7 digits of the phone number are non-repeating. So far, I have written the following regex to validate the NANP format
^(?:[2-9][0-8][0-9])([2-9][0-9]{2}[0-9]{4})$
How do I modify this regex to also account for non-repeating last 7 digits? Please note that using two regexes is not an option due to constraints of existing code.
Edit:
I have to check for consecutive duplicates in all 7 digits. For e.g. 2062222222 should be considered invalid whereas 2062221234 or 2062117777 should be considered valid.
Thanks
Are you talking about consecutive repeating digits, or do all seven digits have to be unique? For example:
2342497553 // consecutive duplicates
2345816245 // non-consecutive duplicates
2345816249 // no duplicates
This regex filters out consecutive duplicates:
^(?:[2-9][0-8][0-9])(?!.*(\d)\1)([2-9][0-9]{2}[0-9]{4})$
...while this one disallows any duplicate digits:
^(?:[2-9][0-8][0-9])(?!.*(\d).*\1)([2-9][0-9]{2}[0-9]{4})$
After the first three digits have been consumed, the lookahead tries to find a character that's repeated, either immediately ((?!.*(.)\1)
) or with optional intervening characters ((?!.*(.).*\1)
). And it's a negative lookahead, so if it succeeds, the overall match fails.
EDIT: It turns out the problem is simpler than I thought. To filter out numbers like 2345555555
, where the last seven digits are identical, use this:
^(?:[2-9][0-8][0-9])(?!(\d)\1+$)([2-9][0-9]{2}[0-9]{4})$
It's important to include the end anchor ($
), because without that it would fail to match valid numbers like 2345555556
. Alternatively, you could tell it to look for exactly six more of the captured digit: (?!(\d)\1{6})
.
I'm pretty sure the non-repeating portion of this came up last night and the general consensus was that regular expressions can't handle non-repetition directly, you'd have to put in an unmanageably large number of alternative cases. I don't think I've actually seen it proven, but I'm pretty sure it's true. It boils down to the fact that regular expressions have no memory. I suggest you use the regexp to validate the format and run it through a separate algorithm to check for repetition.