I use Visual Studio 2010, C# to read Gmail inbox using IMAP
, it works as a charm, but I think Unicode is not fully supported as I cannot get Persian (Farsi) strings easily.
For instance I have my string: سلام
, but IMAP
gives me: "=?utf-8?B?2LPZhNin2YU=?="
.
How can I convert it to original string? any tips from converting utf-8 to string?
Let's have a look at the meaning of the MIME encoding:
=?utf-8?B?...something...?=
^ ^
| +--- The bytes are Base64 encoded
|
+---- The string is UTF-8 encoded
So, to decode this, take the ...something...
out of your string (2LPZhNin2YU=
in your case) and then
reverse the Base64 encoding
var bytes = Convert.FromBase64String("2LPZhNin2YU=");
interpret the bytes as a UTF8 string
var text = Encoding.UTF8.GetString(bytes);
text
should now contain the desired result.
A description of this format can be found in Wikipedia:
- http://en.wikipedia.org/wiki/MIME#Encoded-Word
What you have is a MIME encoded string. .NET does not include libraries for MIME decoding, but you can either implement this yourself or use a library.
here he is
public static string Decode(string s)
{
return String.Join("", Regex.Matches(s ?? "", @"(?:=\?)([^\?]+)(?:\?B\?)([^\?]*)(?:\?=)").Cast<Match>().Select(m =>
{
string charset = m.Groups[1].Value;
string data = m.Groups[2].Value;
byte[] b = Convert.FromBase64String(data);
return Encoding.GetEncoding(charset).GetString(b);
}));
}