Fully qualified domain name validation

2019-01-08 13:32发布

Is there a quick and dirty way to validate if the correct FQDN has been entered? Keep in mind there is no DNS server or Internet connection, so validation has to be done via regex/awk/sed.

Any ideas?

标签: regex bash fqdn
5条回答
Juvenile、少年°
2楼-- · 2019-01-08 13:33
(?=^.{4,253}$)(^((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,63}$)

regex is always going to be at best an approximation for things like this, and rules change over time. the above regex was written with the following in mind and is specific to hostnames-

Hostnames are composed of a series of labels concatenated with dots. Each label is 1 to 63 characters long, and may contain:

  • the ASCII letters a-z (in a case insensitive manner),
  • the digits 0-9,
  • and the hyphen ('-').

Additionally:

some assumptions:

  • TLD is at least 2 characters and only a-z
  • we want at least 1 level above TLD

results: valid / invalid

  • 911.gov - valid
  • 911 - invalid (no TLD)
  • a-.com - invalid
  • -a.com - invalid
  • a.com - valid
  • a.66 - invalid
  • my_host.com - invalid (undescore)
  • typical-hostname33.whatever.co.uk - valid

EDIT: John Rix provided an alternative hack of the regex to make the specification of a TLD optional:

(?=^.{1,253}$)(^(((?!-)[a-zA-Z0-9-]{1,63}(?<!-))|((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,63})$)
  • 911 - valid
  • 911.gov - valid

EDIT 2: someone asked for a version that works in js. the reason it doesn't work in js is because js does not support regex look behind. specifically, the code (?<!-) - which specifies that the previous character cannot be a hyphen.

anyway, here it is rewritten without the lookbehind - a little uglier but not much

(?=^.{4,253}$)(^((?!-)[a-zA-Z0-9-]{0,62}[a-zA-Z0-9]\.)+[a-zA-Z]{2,63}$)

you could likewise make a similar replacement on John Rix's version.

EDIT 3: if you want to allow trailing dots - which is technically allowed:

(?=^.{4,253}$)(^((?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,63}\.?$)

I wasn't familiar with trailing dot syntax till @ChaimKut pointed them out and I did some research

Using trailing dots however seems to cause somewhat unpredictable results in the various tools I played with so I would be advise some caution.

查看更多
贼婆χ
3楼-- · 2019-01-08 13:35

CONSIDERATION #1:

Please note that due to relaxed requirements in RFC-2181 DNS labels can consist of pretty much any combination of symbols (however, the length restrictions are still there):

"Any binary string whatever can be used as the label of any resource record. Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs." (https://tools.ietf.org/html/rfc2181#section-11)

CONSIDERATION #2:

"There is an additional rule that essentially requires that top-level domain names not be all-numeric" (https://tools.ietf.org/html/rfc3696#section-2)

Taking into account these two considerations, the correct regex looks like this:

/^(?!:\/\/)(?=.{1,255}$)((.{1,63}\.){1,127}(?![0-9]*$)[a-z0-9-]+\.?)$/i

See demo @ http://regexr.com/3g5j0

查看更多
▲ chillily
4楼-- · 2019-01-08 13:36

The following expression

(^((?=^.{4,253}$)(((http){0,1}|(http){0,1}|(ftp){0,1}|(ws){0,1})(s{0,1}):\/\/){0,1})((((?!-)[\pL0-9\-]{1,63})(?<!-)(\.)){1,})(((?!-)[a-z0-9\-]{1,63})(?<!-)((\/{0,1}[\pL\pN?=\-]*)+){1})$)

will match

https://www.tes1t.com/lets/to?878932572
https://www.test.co.uk/lets/to?878932572
http://www.test.com/lets/to?878932572
http://www.test.co.uk/lets/to?878932572
ftp://www.test.com/lets/to?878932572
subdomain.test.com/lets/to?878932572
subdomain.test.com/lets/to?878932572
subdomain.subdomain.test.net/lets/to?878932572

sub-domain.test.net/lets/to?878932572
sub-domain.test.net/lets-go/to?878932572
www.test.net/lets/to?878932572
www.test-test.com/
www.test-test.com

subdomain.subdomainsubdomainsuèdomainsubdomainsubdomainsubdomainsubdomain.net/let2s/to?=878932572

www.test-test.co.uk
http://www.test-test-.com/test
www.test-teèst.co.uk/lets
www.test-test.co.uk/lets/
www.test-test.co.uk/lets/to?
test-test.co.uk/lets/to?
test-test.co.uk/lets/
test-test.co.uk/lets
test-test.co.uk
http://test.com/lets/to?878932572
https://test.com/lets/to?878932572
ftp://test.com/lets/to?878932572
ftps://test.com/lets/to?878932572
ws://test.com/lets/to?878932572aa
wss://test.com/lets/to?=878932572bar
test.com

subdomain.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.khbdomainsubdomainsubdomain.test.net/lets/to?87893257

but not match:

www.-test-fail-.com
www.-test-fail.com
-test-fail.com
test-fail-.com

subdomain.subdomainsubdomainsubdomainsubdomainsubdomainsubdomainsubdomainsubdomainsubdomainsubdomainubdomainsubdomainsubdomain.test.net/lets/to?878932572

subdomain.subdomainsubdomainsubdcnvcnvcnofhfhghgfhvnhj-mainsubdomainsubdohhghghghfhgffgjh-gfhfdhfdghmainsubdocgvhngvnbnbmghghghaihgfjgfnfhfdghgsufghgghghhdfjgffsgfbdomainsubdomainsubdomainsubdomainsubdomainsubdomainsubdomain.test.net/lets/to?878932572

subdomain.test.test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test..test.khbdomainsubdomainsubdomain.test.net/lets/to?87893257
查看更多
我只想做你的唯一
5楼-- · 2019-01-08 13:44

This regex is what you want:

(?=^.{1,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)

It match your example domain (groupa-zone1appserver.example.com or cod.eu etc...)

I'll try to explain:

(?=^.{1,254}$) matches domain names (that can begin with any char) that are long between 1 and 254 char, it could be also 5,254 if we assume co.uk is the minimum length.

(^ starting match

(?: define a matching group

(?!\d+\.) the domain name should not be composed by numbers, so 1234.co.uk or abc.123.uk aren't accepted while 1a.ko.uk yes.

[a-zA-Z0-9_\-] the domain names should be composed by words with only a-zA-Z0-9_-

{1,63} the length of any domain level is maximum 63 char, (it could be 2,63)

+ and

(?:[a-zA-Z]{2,})$) the final part of the domain name should not be followed by any other word and must be composed of a word minimum of 2 char a-zA-Z

查看更多
劫难
6楼-- · 2019-01-08 13:49

It's harder nowadays, with internationalized domain names and several thousand (!) new TLDs.

The easy part is that you can still split the components on ".".

You need a list of registerable TLDs. There's a site for that:

https://publicsuffix.org/list/effective_tld_names.dat

You only need to check the ICANN-recognized ones. Note that a registerable TLD can have more than one component, such as "co.uk".

Then there's IDN and punycode. Domains are Unicode now. For example,

"xn--nnx388a" is equivalent to "臺灣". Both of those are valid TLDs, incidentally.

For punycode conversion code, see "http://golang.org/src/pkg/net/http/cookiejar/punycode.go".

Checking the syntax of each domain component has new rules, too. See RFC5890 at http://tools.ietf.org/html/rfc5890

Components can be either A-labels (ASCII only) or Unicode. ASCII labels either follow the old syntax, or begin "xn--", in which case they are a punycode version of a Unicode string.

The rules for Unicode are very complex, and are given in RFC5890. The rules are designed to prevent such things as mixing characters from left-to-right and right-to-left sets.

Sorry there's no easy answer.

查看更多
登录 后发表回答