I was curious if I should encode urls with ASCII or UTF-8. I was under the belief that urls cannot have non-ASCII characters, but someone told me they can have UTF-8, and I searched around and couldn't quite find which one is true. Does anyone know?
相关问题
- UrlEncodeUnicode and browser navigation errors
- Improve converting string to readable urls
- WebElement.getText() function and utf8
- How to convert a string to a byte array which is c
- Jasper: error opening input stream from url
相关文章
- iconv() Vs. utf8_encode()
- When sending XML to JMS should I use TextMessage o
- Spanish Characters in HTML Page Title
- Google app engine datastore string encoding proble
- How can i get know that my String contains diacrit
- C# HttpClient.SendAsync always returns 404 but URL
- Prevent $anchorScroll from modifying the url
- Base64 Encoding: Illegal base64 character 3c
There are two parts to this, but they both amount to "yes".
With IDNA, it is possible to register domain names using the full Unicode repertoire (with a few minor twists to prevent ambiguities and abuse).
The path part is not strictly regulated, but it's possible to encode arbitrary strings in the path. The browser could opt to display a human-readable rendering rather than an encoded path. However, this requires heuristics, as there is no way to specify the character set and encoding of the path.
So, http://xn--msic-0ra.example/mot%C3%B6rhead is a (fictional example, not entirely correct) computer-readable encoded URL which could be displayed to the user as http://müsic.example/motörhead. The domain name is encoded as
xn--msic-0ra.example
in something called Punycode, and the path contains the label "motörhead" encoded as UTF-8 and URL encoded (the Unicode code point U+00F6 is reprecented with the two bytes 0xC3 0xB6 in UTF-8).The path could also be
mot%F6rhead
which is the same label in Latin-1. In this case, deducing a reasonable human-readable representation would be much harder, but perhaps the context of the surrounding characters could offer enough hints for a good guess.In isolation,
%F6
could be pretty much anything, and%C3%B6
could be e.g. UTF-16.