I am pulling French emails from a mailbox and the emails contain accents. I believe it is using UTF8 encoding.
I have tried different UTF8 conversion methods I've found around the Internet but have been unsuccessful.
How, for example, in C#, do I convert this: Montr=C3=A9al to Montréal?
Edit: Also, it is inconsistent. Sometimes it may be like Montr& eacute;al. (The space after the ampersand is just added so the browser does not convert it.)
Thanks!!
Mark
That's not UTF-8. That's quoted printable, which quite isn't the same sort of encoding as UTF-8 - it's more an "ASCII text to Unicode text" encoding.
Quoted printable will effectively allow you to convert the ASCII message into a byte array which can then be decoded as UTF-8.
I'm not sure whether there's any direct support in .NET for quoted printable encoding, which is somewhat bizarre... I may well have missed something.
The UTF-8 encoding translates an array of bytes (8-bit numbers) to a string (or vice versa). I.e. there is a mapping between "numbers" and "characters". The set of characters is larger than the set of ASCII characters, for example é is part of UTF-8, but not part of ASCII.
Quoted-Prinable encoding translates an array of bytes (8-bit number) to a sequence of ASCII characters (actually a subset of it).
Thus, combining both you can "encode" a UTF-8 string into a sequence of (a subset) of ASCII characters (ASCII string).
The same can be done with other encodings (e.g. ISO-8859-1). Thus you need to have both information:
- The given ASCII string is quoted printable.
- The resulting byte array represents a string having encoding UTF-8.
Decoding quoted-printable thus has two steps:
Create the byte array say bytes[] via the quoted printable rules, i.e.
- The substring =NM maps to a byte NM (where NM is hexadecimal) ("N*16 + M")
- Any other character maps to its ASCII byte
(Note that the similar q-encoded-word has an additional mapping for the _ to space)
Then interpret the byte array as UTF-8 string.