Number in the top-level domain?

2020-03-01 08:21发布

Can top-level domains contain a number at the end? Idk nothing about DNS rules etc but when I try to use PHP's filter_var() function with FILTER_VALIDATE_EMAIL for test@null.com1 it returns true.

3条回答
神经病院院长
2楼-- · 2020-03-01 08:52

Conceptually, there is nothing that disallows numbers in a TLD and in the future, who knows, perhaps there will be numeric TLDs.

There are no TLDs at the moment that do have numbers in them - the function probably does not test against a list of known TLDs (as it is subject to change), but lexically.

查看更多
▲ chillily
3楼-- · 2020-03-01 08:52

Actually there are quite a few TLDs currently in use that contain numbers:

XN--1QQW23A
XN--3BST00M
XN--3DS443G
XN--3E0B707E
XN--45BRJ9C
XN--4GBRIM
XN--55QW42G
XN--55QX5D
XN--6FRZ82G
XN--6QQ986B3XL
XN--80ADXHKS
XN--80AO21A
XN--80ASEHDB
XN--80ASWG
XN--90A3AC
XN--C1AVG
XN--CG4BKI
XN--CLCHC0EA0B2G2A9GCD
XN--CZR694B
XN--CZRU2D
XN--D1ACJ3B
XN--FIQ228C5HS
XN--FIQ64B
XN--FIQS8S
XN--FIQZ9S
XN--FPCRJ9C3D
XN--FZC2C9E2C
XN--GECRJ9C
XN--H2BRJ9C
XN--I1B6B1A6A2E
XN--IO0A7I
XN--J1AMH
XN--J6W193G
XN--KPRW13D
XN--KPRY57D
XN--KPUT3I
XN--L1ACC
XN--LGBBAT1AD8J
XN--MGB9AWBF
XN--MGBA3A4F16A
XN--MGBAAM7A8H
XN--MGBAB2BD
XN--MGBAYH7GPA
XN--MGBBH1A71E
XN--MGBC0A9AZCG
XN--MGBERP4A5D4AR
XN--MGBX4CD0AB
XN--NGBC5AZD
XN--NQV7F
XN--NQV7FS00EMA
XN--O3CW4H
XN--OGBPF8FL
XN--P1AI
XN--PGBS0DH
XN--Q9JYB4C
XN--RHQV96G
XN--S9BRJ9C
XN--SES554G
XN--UNUP4Y
XN--VHQUV
XN--WGBH1C
XN--WGBL6A
XN--XHQ521B
XN--XKC2AL3HYE2A
XN--XKC2DL3A5EE0H
XN--YFRO4I67O
XN--YGBI2AMMX
XN--ZFR164B

You can see an up to date list here data.iana.org/TLD/tlds-alpha-by-domain.txt or a list with descriptions here swcs.com.au/tld.htm

查看更多
乱世女痞
4楼-- · 2020-03-01 08:55

Does top-level domain can contain a number at the end?

Yes technically, except if it is purely numerical, then it can not be a TLD, under current rules and for easy reasons to understand (to disambiguate with IP addresses). And it can not contain a number at the end, except if it is an IDN TLD, for reasons enforced by ICANN.

Let us go back to some RFCs to have some clearer definitions of things:

RFC 952: DOD INTERNET HOST TABLE SPECIFICATION (October 1985)

This is the definition of an Internet "hostname" back then:

A "name" (Net, Host, Gateway, or Domain name) is a text string up
to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus
sign (-), and period (.). Note that periods are only allowed when
they serve to delimit components of "domain style names". (See
RFC-921, "Domain Name System Implementation Schedule", for
background). No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character. The last character must not be a minus sign or period.

Note that this also has the following:

Single character names or nicknames are not allowed.

Hence at that point:

  • com1 is a valid TLD
  • 3com is not ("The first character must be an alpha character.")
  • 42 is not (same reason)
  • 1 is not (same reason)
  • a is not ("Single character names or nicknames are not allowed.")

RFC 1034: DOMAIN NAMES - CONCEPTS AND FACILITIES (November 1987)

This is one of the RFC that created the DNS as we know today. For compatibility reasons it defined hostnames as a sequence of labels, where a label is defined as such:

They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

The TLD is one label among others (the L in TLD). Per the above rule, com1 is a valid label, and hence a valid TLD, where 3com would not have been. Which directly brings us to the following amendment.

RFC 1123: Requirements for Internet Hosts -- Application and Support (October 1989)

This amends the previous RFC by changing one rule:

The syntax of a legal Internet host name was specified in RFC-952 [DNS:4]. One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit. Host software MUST support this more liberal syntax.

So at that point:

  • com1 is a valid TLD
  • 3com is also valid
  • 42 is valid
  • 1 is valid
  • a is valid

For the case of "numerical" TLDs, the following rule in first document applies:

Whenever a user inputs the identity of an Internet host, it SHOULD be possible to enter either (1) a host domain name or (2) an IP address in dotted-decimal ("#.#.#.#") form. The host SHOULD check the string syntactically for a dotted-decimal number before looking it up in the Domain Name System.

and

If a dotted-decimal number can be entered without such identifying delimiters, then a full syntactic check must be made, because a segment of a host domain name is now allowed to begin with a digit and could legally be entirely numeric (see Section 6.1.2.4). However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.

RFC 1738: Uniform Resource Locators (URL) (December 1994)

This also speaks about the TLD, but giving:

The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by ".". Fully qualified domain names take the form as described in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 [5]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumerical character and possibly also containing "-" characters. The rightmost domain label will never start with a digit, though, which syntactically distinguishes all domain names from the IP addresses.

RFC 3696: Application Techniques for Checking and Transformation of Names (February 2004)

This was needed to introduce IDNs (Internationalized Domain Names) and it has this to say:

Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications. This preferred form has been the only one permitted in the names of top-level domains, or TLDs. In general, it is also the only form permitted in most second-level names registered in TLDs, although some names that are normally not seen by users obey other rules. It derives from the original ARPANET rules for the naming of hosts (i.e., the "hostname" rule) and is perhaps better described as the "LDH rule", after the characters that it permits. The LDH rule, as updated, provides that the labels (words or strings separated by periods) that make up a domain name must consist of only the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen. No other symbols or punctuation characters are permitted, nor is blank space. If the hyphen is used, it is not permitted to appear at either the beginning or end of a label. There is an additional rule that essentially requires that top-level domain names not be all- numeric.

In fact as soon as IDNs are involved, and they are IDN TLDs (both ccTLDs and gTLDs now), the encoding chosen generates an ASCII string of the form xn--something where the something can have digits, including at the end, like shown in other answers.

However it is not really clear from where the "additional rule" in the last sentence comes from.

RFC 4697: Observed DNS Resolution Misbehavior (October 2006)

Not defining anything, but providing some interesting facts:

The root name servers receive a significant number of A record queries where the QNAME looks like an IPv4 address.

and

A possible solution is to delegate these numeric TLDs from the root zone to a separate set of servers to absorb the traffic.

Which clearly shows that indeed, in the wild, there are applications, maybe by mistake but it shows at least that it works technically, sending queries for names that are indeed formatted like IPv4 addresses, so with a fully numerical "TLD".

There was in fact an experience to launch a .42 registry, obviously completely outside of ICANN ecosystem. You can see a summary of it at http://www.dotsauce.com/experimental-numeric-tld-42-domain/ and an archive of their main explanations at https://web.archive.org/web/20101222151118/http://register.42registry.org:80/ (in French).

It did not went far, even if it technically works.

It showed for example that Microsoft based OS by default did not consider purely numeric TLDs at all, but they provided a patch for that: https://support.microsoft.com/en-us/help/947228/error-message-when-you-try-to-join-a-windows-vista-based-client-comput "When you try to join a Windows Vista-based client computer to a top level domain (TLD) that has a purely numeric suffix, the Windows Vista-based client computer cannot join the domain. [..] This behavior is by design."

Internet-Draft draft-liman-tld-names-06: Top Level Domain Name Specification (November 2011)

This finally gives some explanations on why purely numeric TLD or even TLD with one digit are sometimes considered invalid when it is not a clear consequence from above specifications:

(section 2.1 below refers to content in RFC 1123, quoted above)

In addition, the DISCUSSION section of Section 2.1 says:

 'However, a valid host name can never have the dotted-decimal form
 #.#.#.#, since at least the highest-level component label will be
 alphabetic.'  [Section 2.1]

Some implementers may have understood the above phrase 'will be alphabetic' to be a protocol restriction.

But it basically just recommend to go with the flow and continue the same restrictions:

Neither [RFC0952] nor [RFC1123] explicitly states the reasons for these restrictions. It might be supposed that human factors were a consideration; [RFC1123] appears to suggest that one of the reasons was to prevent confusion between dotted-decimal IPv4 addresses and host domain names. In any case, it is reasonable to believe that the restrictions have been assumed in some deployed software, and that changes to the rules should be undertaken with caution.

Hence it offered this definition:

traditional-tld-label = 1*63(ALPHA)

This draft never converted to an RFC because not everyone agreed with it. You can find a thread with dissenting voices for it at https://www.ietf.org/mail-archive/web/dnsop/current/msg08866.html ; basically it was not clear if there was a restriction in the past that we are now trying to relax a little or if there never was a restriction to begin with and that people implemented systems wrongly.

For example you can see about this Chromium/Chrome bugreport: https://bugs.chromium.org/p/chromium/issues/detail?id=31405 Browsing failed if using a TLD starting with a digit or purely numeric (it worked if it ended with a digit with letters before). This was not considered as a bug, and is not fixed, because the browser ships with a list of TLDs so it can know which ones are valid which are not, besides testing their syntax.

ICANN Application Guidebook for new TLDs (June 2012)

Available at https://newgtlds.icann.org/en/applicants/agb/guidebook-full-04jun12-en.pdf it says the following starting at page 64:

The ASCII label (i.e., the label as transmitted on the wire) must be valid as specified in technical standards Domain Names: Implementation and Specification (RFC 1035), and Clarifications to the DNS Specification (RFC 2181) and any updates thereto.

The ASCII label must be a valid host name, as specified in the technical standards DOD Internet Host Table Specification (RFC 952), Requirements for Internet Hosts — Application and Support (RFC 1123), and Application Techniques for Checking and Transformation of Names (RFC 3696), Internationalized Domain Names in Applications (IDNA)(RFCs 5890-5894), and any updates thereto. This includes the following:

The ASCII label must consist entirely of letters (alphabetic characters a-z), or

The label must be a valid IDNA A-label (further restricted as described in Part II below).

Specially note the: The ASCII label must consist entirely of letters (alphabetic characters a-z)

This immediately forbids any full numerical, as well as in fact any digit, including at end, except for IDN TLDs, the one with the form xn--something.

Note that someone asked directly ICANN about this, and got the following reply, shown at https://domaingang.com/domain-news/icann-applicant-handbook-this-is-why-we-cannot-have-numeric-gtlds/ :

Please note Numeric TLD’s were prohibited in the first round of applications. The prohibition on numeric gTLDs in the applicant guidebook (http://newgtlds.icann.org/en/applicants/agb) derives from a number of technical concerns regarding the ability of such domains to operate properly. Domain names are often used in place where other kinds of identifiers may be used like IP addresses.

The fact that a TLD is all alphabetic is often a key determinant for software in identifying a domain name. If a TLD such as “.123” were allowed, you could have a domain name of “74.125.244.123” which would be difficult to discriminate from an IP address “74.125.244.123.”. There are also other considerations: some technical standards documentation states that TLDs will be alphabetical, which has been codified as an assumption in software also.

The limitation in the AGB to alphabetic characters was designed to limit these scenarios that means such TLDs are not likely to work well in software, as well as limit potential security issues that may result from the same issues.

查看更多
登录 后发表回答