What is the maximum length of an IDNA converted do

2019-03-26 00:22发布

问题:

First things first:

I'm storing multiple domains to a database, after I've converted each and every domain name to it's IDNA version. What I need to know the maximum length such an IDNA-converted domain name can have so I can define the database field's max length.

Known fact:

Now, I know the maximum number of characters in a domain name (including any subdomains) is 255 characters.

Where I lost it:

That's easy at first glance, but... does this mean regular ascii characters of international characters (think UTF-8 encoding)?

To give you an example: The domain "müller.de" has 9 characters when I ignore that "ü" is an international character that needs more bytes to be represented. The IDNA version of "müller.de" is "xn--mller-kva.de", which has 16 characters. This shows there's definitely a difference in maximum length depending on "if" it is IDNA converted or not.

Depending on what kind of characters they mean, the 255-character maximum could be the international character version, the IDNA converted version or even both.

And that's where I lost it a bit... especially, since I have to take into account that not all domains will be sane and stuff like "öüßüöäéèê.example.äöüßüöäéèê-äöüßüöäéèê.test.äöüßüöäéèê.com" and even worse is to be expected.

So, "guessing" and "hoping for the best" is not an option. I need to know for sure...

The question is:

Based on the known fact that the maximum number of characters in a domain name (including any subdomains) is 255 characters... what is the maximum length of an IDNA converted domain name?

Or did they mean the IDNA converted version (punycode) is also restricted to 255 characters (which would mean that domains with international/unicode characters would actually have shorter limits in their unicode representation, because their IDNA converted version would have to respect the 255 char limit)?

回答1:

My understanding is that the 255-character limit is to be considered after the IDNA conversion.

This is because DNS records have this character limit, and in general DNS records can only contain letters, digits and hyphens (from Wikipedia). The DNS server therefore uses the Punycode version of the IDN for its record, not the Unicode version.



回答2:

OK, I think I found out myself and this snippet I found (by searching the internet) helped:

There were essentially two different options open for introducing internationalized domain names (IDN). The first was to make adjustments to the domain name system (DNS) which would allow unicode characters to be used directly. It was felt that this was too drastic a measure, and hence the second option was chosen. This involved compiling an algorithm to specify how a unicode string should be converted into a permitted ASCII domain name. This ACE string (ACE stands for ASCII Compatible Encoding) is then entered into the DNS. The introduction of IDN means that, for the very first time, the entry in the DNS is no longer identical with the domain name.

— Source

The answer is that the length to respect is the 255 character limit as DNS expects it.

My suspicion was correct. The domain name and the entry in the DNS are two different things with IDN. It's the maximum length of the DNS entry that counts.

The domain name "müller.de" has 9 characters, but the corresponding ACE (ASCII Compatible Encoding) string "xn--mller-kva.de", however, has 16 characters.

It's the ACE string that is used by DNS and it's the ACE string that falls under the 255 character limit. This means that the maximum limit of it's unicode (domain) version is defined by the number of unicode characters used and if - after IDNA conversion - the string still fits within the 255 character limit.

Geez, the specs sure could've been be a bit clearer on things like this. Especially as international domain names have been around since somewhere near March 1st, 2004. But I found the answer, and that's what counts.

Perhaps this can help someone who's having the same question.

The simple answer related to my database field length is 255 CHARs.

The fact that I store the domain names in their IDNA converted (punycode/ACE string) version only confirms this maximum character limit.



回答3:

RFC3492 says this about one of the features os IDNA encoding:

Efficient encoding: The ratio of basic string length to extended string length is small. This is important in the context of domain names because RFC1034 restricts the length of a domain label to 63 characters.

That is it. 63 characters is a maximum length for any domain name regadless of wether it is in IDNA or in ASCII.