可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am pulling French emails from a mailbox and the emails contain accents. I believe it is using UTF8 encoding.

I have tried different UTF8 conversion methods I've found around the Internet but have been unsuccessful.

How, for example, in C#, do I convert this: Montr=C3=A9al to Montréal?

Edit: Also, it is inconsistent. Sometimes it may be like Montr& eacute;al. (The space after the ampersand is just added so the browser does not convert it.)

Thanks!! Mark

回答1:

That's not UTF-8. That's quoted printable, which quite isn't the same sort of encoding as UTF-8 - it's more an "ASCII text to Unicode text" encoding.

Quoted printable will effectively allow you to convert the ASCII message into a byte array which can then be decoded as UTF-8.

I'm not sure whether there's any direct support in .NET for quoted printable encoding, which is somewhat bizarre... I may well have missed something.

回答2:

The UTF-8 encoding translates an array of bytes (8-bit numbers) to a string (or vice versa). I.e. there is a mapping between "numbers" and "characters". The set of characters is larger than the set of ASCII characters, for example é is part of UTF-8, but not part of ASCII.

Quoted-Prinable encoding translates an array of bytes (8-bit number) to a sequence of ASCII characters (actually a subset of it).

Thus, combining both you can "encode" a UTF-8 string into a sequence of (a subset) of ASCII characters (ASCII string).

The same can be done with other encodings (e.g. ISO-8859-1). Thus you need to have both information:

The given ASCII string is quoted printable.
The resulting byte array represents a string having encoding UTF-8.

Decoding quoted-printable thus has two steps:

Create the byte array say bytes[] via the quoted printable rules, i.e.
- The substring =NM maps to a byte NM (where NM is hexadecimal) ("N*16 + M")
- Any other character maps to its ASCII byte (Note that the similar q-encoded-word has an additional mapping for the _ to space)
Then interpret the byte array as UTF-8 string.