Looking to build some regex to validate domain nam

2019-08-12 11:37发布

问题:

One of our clients validates email addresses in their own software prior to firing it via an API call to our system. The issue is however that their validation rules do not match those our system, therefore they are parsing and accepting addresses which break our rules. This is causing lots of failed calls.

They are parsing stuff like "dave@-whatever.com", this goes against RFC 952/RFC 1123 rules as it begins with a hyphen. They have asked that we provide them with our regex list so they can update validation on their platform to match ours.

So, I need to find/build an RFC 952/RFC 1123 accepted. I found this in another SO thread (i'm a lurker :)), would it be suitable and prevent these illegal domains from being sent?

"^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";

回答1:

A domain part has a max length of 255 characters and can only consist of digits, ASCII characters and hyphens; a hyphen cannot come first.

Checking the validity of one domain component can be done using this regex, case insensitive, length notwithstanding:

[a-z0-9]+(-[a-z0-9]+)*

This is the normal* (special normal*)* pattern again, with normal being [a-z0-9] and special being -.

Then you take all this in another normal* (special normal*)* pattern as the normal part, and the special being ., and anchor it at the beginning and end:

^[a-z0-9]+(-[a-z0-9]+)*(\.[a-z0-9]+(-[a-z0-9]+)*)+$

If you cannot afford case insensitive matching, add A-Z to the character class.

But please note that it won't check for the max length of 255. It may be done using a positive lookahead, but the regex will become very complicated, and it is shorter to be using a string length function ;)