I have a string that I receive from email via C# and I want to display it in a correct format. I know the encoding in coming in as Encoding.Default
, According to this answer I have to convert it to utf8, So I tried this code:
byte[] bytes = Encoding.Default.GetBytes(input);
string strResult = Encoding.UTF8.GetString(bytes);
It works, but it can't convert some characters:
Actually in web mail interface Original string is:
باسلام همکار گرامی شماره 53018 مربوط به دبیرخانه ستاد می باشد لطفا اصلاح فرمائید
When I convert the string with the code I give this result:
باس �?ا�? �?�?�?ار گرا�?�? �?ا�?�? ش�?ار�? 53018 �?رب�?ط ب�? د ب�?رخا�?�? ستاد �?�? باشد �?طفا اص�?اح فر�?ائ�?د�?
Any idea?
Update:
PS: The content of the input variable:
اÙزاÙØ´ تسÙÙÙات \r\n \r\n\r\n باس Ùا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´ÙارÙ
Finally solved the problem (+), As you know UTF-8 code unit values have been stored as a sequence of 16-bit code units in a C# string, So we should verify that each code unit is within the range of a byte, First we should copy those values into bytes and then convert the new UTF-8 byte sequence into UTF-16:
byte[] utf8Bytes = new byte[utf8String.Length];
for (int i=0;i<utf8String.Length;++i) {
utf8Bytes[i] = (byte)utf8String[i];
}
var result = Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);
So for this input:
اÙزاÙØ´ تسÙÙÙات \r\n\r\n\r\n<p>باسÙا٠ÙÙÙار گراÙÙ ÙاÙÙ Ø´Ùار٠53018 ÙربÙØ· ب٠د بÙرخاÙ٠ستاد Ù٠باشد ÙØ·Ùا اصÙØ§Ø ÙرÙائÙد\r\n\r\n
I get the correct result:
افزايش تسهيلات \r\n\r\n\r\n<p>باسلام همكار گرامي نامه شماره 53018 مربوط به د بيرخانه ستاد مي باشد لطفا اصلاح فرمائيد\r\n\r\n \r\n\r\n
PS: for removing extra characters I use this code:
result = result.Replace('\r', ' ').Replace('\n', ' ').ToString();