Transforming string to UTF8

2019-07-24 09:50发布

问题:

I have a string that I receive from email via C# and I want to display it in a correct format. I know the encoding in coming in as Encoding.Default, According to this answer I have to convert it to utf8, So I tried this code:

byte[] bytes = Encoding.Default.GetBytes(input);
string strResult = Encoding.UTF8.GetString(bytes);

It works, but it can't convert some characters:
Actually in web mail interface Original string is:

باسلام همکار گرامی شماره 53018 مربوط به دبیرخانه ستاد می باشد لطفا اصلاح فرمائید 

When I convert the string with the code I give this result:

باس �?ا�? �?�?�?ار گرا�?�? �?ا�?�? ش�?ار�? 53018  �?رب�?ط ب�? د ب�?رخا�?�? ستاد �?�? باشد �?طفا اص�?اح فر�?ائ�?د�? 

Any idea?
Update: PS: The content of the input variable:

اÙزاÙØ´ تسÙÙÙات \r\n \r\n\r\n باس Ùا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´ÙارÙ

回答1:

Finally solved the problem (+), As you know UTF-8 code unit values have been stored as a sequence of 16-bit code units in a C# string, So we should verify that each code unit is within the range of a byte, First we should copy those values into bytes and then convert the new UTF-8 byte sequence into UTF-16:

byte[] utf8Bytes = new byte[utf8String.Length];
for (int i=0;i<utf8String.Length;++i) {
      utf8Bytes[i] = (byte)utf8String[i];
}
var result  = Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);

So for this input:

اÙزاÙØ´ تسÙÙÙات \r\n\r\n\r\n<p>باسÙا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´ÙارÙ&nbsp;53018 &nbsp;ÙربÙØ· ب٠د بÙرخاÙ٠ستاد Ù٠باشد ÙØ·Ùا اصÙاح ÙرÙائÙد\r\n\r\n

I get the correct result:

افزايش تسهيلات \r\n\r\n\r\n<p>باسلام همكار گرامي نامه شماره&nbsp;53018 &nbsp;مربوط به د بيرخانه ستاد مي باشد لطفا اصلاح فرمائيد\r\n\r\n \r\n\r\n

PS: for removing extra characters I use this code:

result = result.Replace('\r', ' ').Replace('\n', ' ').ToString();