Is there a quick and dirty way to validate if the correct FQDN has been entered? Keep in mind there is no DNS server or Internet connection, so validation has to be done via regex/awk/sed.
Any ideas?
Is there a quick and dirty way to validate if the correct FQDN has been entered? Keep in mind there is no DNS server or Internet connection, so validation has to be done via regex/awk/sed.
Any ideas?
regex is always going to be at best an approximation for things like this, and rules change over time. the above regex was written with the following in mind and is specific to hostnames-
Hostnames are composed of a series of labels concatenated with dots. Each label is 1 to 63 characters long, and may contain:
Additionally:
some assumptions:
results: valid / invalid
EDIT: John Rix provided an alternative hack of the regex to make the specification of a TLD optional:
EDIT 2: someone asked for a version that works in js. the reason it doesn't work in js is because js does not support regex look behind. specifically, the code
(?<!-)
- which specifies that the previous character cannot be a hyphen.anyway, here it is rewritten without the lookbehind - a little uglier but not much
you could likewise make a similar replacement on John Rix's version.
EDIT 3: if you want to allow trailing dots - which is technically allowed:
I wasn't familiar with trailing dot syntax till @ChaimKut pointed them out and I did some research
Using trailing dots however seems to cause somewhat unpredictable results in the various tools I played with so I would be advise some caution.
CONSIDERATION #1:
Please note that due to relaxed requirements in RFC-2181 DNS labels can consist of pretty much any combination of symbols (however, the length restrictions are still there):
"Any binary string whatever can be used as the label of any resource record. Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs." (https://tools.ietf.org/html/rfc2181#section-11)
CONSIDERATION #2:
"There is an additional rule that essentially requires that top-level domain names not be all-numeric" (https://tools.ietf.org/html/rfc3696#section-2)
Taking into account these two considerations, the correct regex looks like this:
/^(?!:\/\/)(?=.{1,255}$)((.{1,63}\.){1,127}(?![0-9]*$)[a-z0-9-]+\.?)$/i
See demo @ http://regexr.com/3g5j0
The following expression
will match
but not match:
This regex is what you want:
It match your example domain (groupa-zone1appserver.example.com or cod.eu etc...)
I'll try to explain:
(?=^.{1,254}$)
matches domain names (that can begin with any char) that are long between 1 and 254 char, it could be also 5,254 if we assume co.uk is the minimum length.(^
starting match(?:
define a matching group(?!\d+\.)
the domain name should not be composed by numbers, so 1234.co.uk or abc.123.uk aren't accepted while 1a.ko.uk yes.[a-zA-Z0-9_\-]
the domain names should be composed by words with only a-zA-Z0-9_-{1,63}
the length of any domain level is maximum 63 char, (it could be 2,63)+
and(?:[a-zA-Z]{2,})$)
the final part of the domain name should not be followed by any other word and must be composed of a word minimum of 2 char a-zA-ZIt's harder nowadays, with internationalized domain names and several thousand (!) new TLDs.
The easy part is that you can still split the components on ".".
You need a list of registerable TLDs. There's a site for that:
https://publicsuffix.org/list/effective_tld_names.dat
You only need to check the ICANN-recognized ones. Note that a registerable TLD can have more than one component, such as "co.uk".
Then there's IDN and punycode. Domains are Unicode now. For example,
"xn--nnx388a" is equivalent to "臺灣". Both of those are valid TLDs, incidentally.
For punycode conversion code, see "http://golang.org/src/pkg/net/http/cookiejar/punycode.go".
Checking the syntax of each domain component has new rules, too. See RFC5890 at http://tools.ietf.org/html/rfc5890
Components can be either A-labels (ASCII only) or Unicode. ASCII labels either follow the old syntax, or begin "xn--", in which case they are a punycode version of a Unicode string.
The rules for Unicode are very complex, and are given in RFC5890. The rules are designed to prevent such things as mixing characters from left-to-right and right-to-left sets.
Sorry there's no easy answer.