How to Decode “=?utf-8?B?…?=” to string in C#

2020-02-06 05:14发布

问题:

I use Visual Studio 2010, C# to read Gmail inbox using IMAP, it works as a charm, but I think Unicode is not fully supported as I cannot get Persian (Farsi) strings easily.

For instance I have my string: سلام, but IMAP gives me: "=?utf-8?B?2LPZhNin2YU=?=".

How can I convert it to original string? any tips from converting utf-8 to string?

回答1:

Let's have a look at the meaning of the MIME encoding:

=?utf-8?B?...something...?=
    ^   ^
    |   +--- The bytes are Base64 encoded
    |
    +---- The string is UTF-8 encoded

So, to decode this, take the ...something... out of your string (2LPZhNin2YU= in your case) and then

  1. reverse the Base64 encoding

    var bytes = Convert.FromBase64String("2LPZhNin2YU=");
    
  2. interpret the bytes as a UTF8 string

    var text = Encoding.UTF8.GetString(bytes);
    

text should now contain the desired result.


A description of this format can be found in Wikipedia:

  • http://en.wikipedia.org/wiki/MIME#Encoded-Word


回答2:

What you have is a MIME encoded string. .NET does not include libraries for MIME decoding, but you can either implement this yourself or use a library.



回答3:

here he is

    public static string Decode(string s)
    {
        return String.Join("", Regex.Matches(s ?? "", @"(?:=\?)([^\?]+)(?:\?B\?)([^\?]*)(?:\?=)").Cast<Match>().Select(m =>
        {
            string charset = m.Groups[1].Value;
            string data = m.Groups[2].Value;
            byte[] b = Convert.FromBase64String(data);
            return Encoding.GetEncoding(charset).GetString(b);
        }));
    }