UTF8 (Quoted Printable) conversion in C# question

2020-04-07 19:31发布

I am pulling French emails from a mailbox and the emails contain accents. I believe it is using UTF8 encoding.

I have tried different UTF8 conversion methods I've found around the Internet but have been unsuccessful.

How, for example, in C#, do I convert this: Montr=C3=A9al to Montréal?

Edit: Also, it is inconsistent. Sometimes it may be like Montr& eacute;al. (The space after the ampersand is just added so the browser does not convert it.)

Thanks!! Mark

标签: c# utf-8
2条回答
We Are One
2楼-- · 2020-04-07 19:59

That's not UTF-8. That's quoted printable, which quite isn't the same sort of encoding as UTF-8 - it's more an "ASCII text to Unicode text" encoding.

Quoted printable will effectively allow you to convert the ASCII message into a byte array which can then be decoded as UTF-8.

I'm not sure whether there's any direct support in .NET for quoted printable encoding, which is somewhat bizarre... I may well have missed something.

查看更多
ら.Afraid
3楼-- · 2020-04-07 20:15

The UTF-8 encoding translates an array of bytes (8-bit numbers) to a string (or vice versa). I.e. there is a mapping between "numbers" and "characters". The set of characters is larger than the set of ASCII characters, for example é is part of UTF-8, but not part of ASCII.

Quoted-Prinable encoding translates an array of bytes (8-bit number) to a sequence of ASCII characters (actually a subset of it).

Thus, combining both you can "encode" a UTF-8 string into a sequence of (a subset) of ASCII characters (ASCII string).

The same can be done with other encodings (e.g. ISO-8859-1). Thus you need to have both information:

  • The given ASCII string is quoted printable.
  • The resulting byte array represents a string having encoding UTF-8.

Decoding quoted-printable thus has two steps:

  1. Create the byte array say bytes[] via the quoted printable rules, i.e.

    • The substring =NM maps to a byte NM (where NM is hexadecimal) ("N*16 + M")
    • Any other character maps to its ASCII byte (Note that the similar q-encoded-word has an additional mapping for the _ to space)
  2. Then interpret the byte array as UTF-8 string.

查看更多
登录 后发表回答