I was looking at the encoding of Chinese characters on Wikipedia and I'm having trouble figuring out what they are using. For instance "的" is encoded as "%E7%9A%84" (see here). That's three bytes, however none of the encodings described on this page uses three bytes to represent Chinese characters. UTF-8 for instance uses 2 bytes.
I'm basically trying to match these three bytes to an actual character. Any suggestion on what encoding it could be?
though Unicode encodes it in 16 bits, utf8 breaks it down to 3 bytes.
The header of a wikipedia page includes this:
So the page is UTF-8.
The example you give is an IRI.
IRIs use the UTF8 encoding. UTF8 implements unicode, and in unicode, each character has a codepoint, that is between 0x4E00 and 0x9FFF (2 bytes) for all chinese characters.
But UTF8 doesn't encode characters by just storing their codepoint (UTF32 does that). Instead, it uses a more complex standard, that makes all chinese ideograms 2 or 3 bytes long.