how to convert unicode text to utf8 text readable?

2019-07-17 20:39发布

问题:

I got a serious problem regarding Unicode and utf8, I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like

Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå 

my question is how to get back my data, it is important for me to get this data back, thanks in advance

回答1:

The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved as Unicode text. You can use C# to reverse this procedure:

string scrambled = "Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ " + 
                   "Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå";
byte[] bytes = Encoding.GetEncoding("windows-1252").GetBytes(scrambled);
string plainText = Encoding.GetEncoding("windows-1256").GetString(bytes);
Console.WriteLine(text);

The plain text output is: "تو اين سورس برنامه عدد دلخواهي رو از ورودي ميگيره و به طول همون عدد مثلثي رو رسم مي کنه"



回答2:

On Linux you can use Gedit to open it as a 1256 encoded file:

gedit shahnameh.txt --encoding WINDOWS-1256

You can do the same work via gui. You just need select the correct encoding from "open" dialog box when opening a file. It should be at the bottom of the open dialog.