What do % signs mean in a url?

2019-02-10 23:43发布

When I copy paste this Wikipedia article it looks like this.

http://en.wikipedia.org/wiki/Gruy%C3%A8re_%28cheese%29

However if you paste this back into the URL address the percent signs disappear and what appears to be Unicode characters ( and maybe special URL characters ) take the place of the percent signs.

Are these abbreviations for Unicode and special URL characters?

I'm use to seeing \u00ff, etc. in JavaScript.

标签: url
4条回答
时光不老,我们不散
2楼-- · 2019-02-10 23:51

It is important to note the % sign servers two primary purposes. One is to encode special characters and the other is to encode Unicode characters outside of what you can put in with your hardware/keyboard. For example %C3%A8 to encode è, and whatever encoding represents a forward slash /.

Using JavaScript we can create a encoding chart:

http://jsfiddle.net/CG8gx/3/

["\x00", "\x01", "\x02", "\x03", "\x04", "\x05", "\x06", "\x07", "\b", "\t", "\n", "\v", "\f", "\r", "\x0E", "\x0F", "\x10", "\x11", "\x12", "\x13", "\x14", "\x15", "\x16", "\x17", "\x18", "\x19", "\x1A", "\x1B", "\x1C", "\x1D", "\x1E", "\x1F", " ", "!", "\"", "#", "$", "%", "&", "'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~", "\x7F"]

查看更多
爷的心禁止访问
3楼-- · 2019-02-10 23:54

It's just a different syntactical convention for what you're used to from JavaScript. URL syntax is simply different from that of JavaScript, in other words, and % is the way one introduces a two-hex-digit character code in that syntax.

Some characters must be escaped in order to be part of a URL/URI. For example, the / character has meaning; it's a metacharacter, in other words. If you need a / in the middle of a path component (which admittedly would be a little weird), you'd have to escape it. It's analogous to the need to escape quote characters in JavaScript string constants.

查看更多
贪生不怕死
4楼-- · 2019-02-11 00:02

% in a URI is followed by two characters from 0-9A-F, and is the escaped version of writing the character with that hex code. Doing this means you can write a URI with characters that might have special meaning in other languages.

Common examples are %20 for a space and %5B and %5C for [ and ], respectively.

查看更多
我命由我不由天
5楼-- · 2019-02-11 00:13

The reference you're looking for is RFC 3987: Internationalized Resource Identifiers, specifically the section on mapping IRIs to URIs.

RFC 3986: Uniform Resource Identifiers specifies that reserved characters must be percent-encoded, but it also specifies that percent-encoded characters are decoded to US-ASCII, which does not include characters such as è.

RFC 3987 specifies that non-ASCII characters should first be encoded as UTF-8 so they can be percent-encoded as per RFC 3986. If you'll permit me to illustrate in Python:

>>> u'è'.encode('utf-8')
'\xc3\xa8'

Here I've asked Python to encode the Unicode è to a string of bytes using UTF-8. The bytes returned are 0xc3 and 0xa8. Percent-encoded, this looks like %C3%A8.

The parenthesis also appearing in your URL do fit in US-ASCII, so they are percent-escaped with their US-ASCII code points, which are also valid UTF-8.

So, no, there is no simple 16×16 table—such a table could never represent the richness of Unicode. But there is a method to the apparent madness.

查看更多
登录 后发表回答