How to read a Chinese text file from C#?

2019-06-24 14:55发布

问题:

How can I read a Chinese text file using C#, my current code can't display the correct characters:

try
{    
    using (StreamReader sr = new StreamReader(path,System.Text.Encoding.UTF8))
    {
        // This is an arbitrary size for this example.
        string c = null;

        while (sr.Peek() >= 0)
        {
            c = null;
            c = sr.ReadLine();
            Console.WriteLine(c);
        }
    }
}
catch (Exception e)
{
    Console.WriteLine("The process failed: {0}", e.ToString());
}

回答1:

You need to use the right encoding for the file. Do you know what that encoding is? It might be UTF-16, aka Encoding.Unicode, or possibly something like Big5. Really you should try to find out for sure instead of guessing though.

As leppie's answer mentioned, the problem might also be the capabilities of the console. To find out for sure, dump the string's Unicode character values out as numbers. See my article on debugging unicode issues for more information and a useful method for dumping the contents of a string.

I would also avoid using the code you're currently using for reading a file line by line. Instead, use something like:

using (StreamReader sr = new StreamReader(path, appropriateEncoding))
{
    string line;
    while ( (line = sr.ReadLine()) != null)
    {
        // ...
    }
}

Calling Peek() requires that the stream is capable of seeking, which may be true for files but not all streams. Also look into File.ReadAllText and File.ReadAllLines if that's what you want to do - they're very handy utility methods.



回答2:

If it is simplified chinese usually it is gb2312 and for the traditionnal chinese it is usually the Big5 :

// gb2312 (codepage 936) :
System.Text.Encoding.GetEncoding(936)

// Big5 (codepage 950) :
System.Text.Encoding.GetEncoding(950)


回答3:

Use Encoding.Unicode instead.

I think you need to change the OutputEncoding of the Console to display it correctly.



回答4:

I just encountered the same problem as yours and I solve it now. I think the main problem would be from txt editor. When you save text in .txt format using notepad, you can choose the encoding at the bottom. The default encoding is ANSI which does not support Chinese stream reading (depends on your computer) while Unicode works for Chinese text. I hope this will help you :)

Cheers,

Ronald



标签: c# text-files