c# encoding problems (question marks) while readin

2020-04-10 04:13发布

I've a problem while reading a .txt file from my Windows Phone app.

I've made a simple app, that reads a stream from a .txt file and prints it.

Unfortunately I'm from Italy and we've many letters with accents. And here's the problem, in fact all accented letters are printed as a question mark.

Here's the sample code:

var resourceStream = Application.GetResourceStream(new Uri("frasi.txt",UriKind.RelativeOrAbsolute));
            if (resourceStream != null)
            {
                {
                    //System.Text.Encoding.Default, true
                    using (var reader = new StreamReader(resourceStream.Stream, System.Text.Encoding.UTF8))
                    {
                        string line;
                        line = reader.ReadLine();

                        while (line != null)
                        {
                            frasi.Add(line);
                            line = reader.ReadLine();       
                        } 
                    }
                }

So, I'm asking you how to avoid this matter.

All the best.

[EDIT:] Solution: I didn't make sure the file was encoded in UTF-8- I saved it with the correct encoding and it worked like a charm. thank you Oscar

标签: c# encoding
2条回答
霸刀☆藐视天下
2楼-- · 2020-04-10 04:42

You need to use Encoding.Default. Change:

using (var reader = new StreamReader(resourceStream.Stream, System.Text.Encoding.UTF8))

to

using (var reader = new StreamReader(resourceStream.Stream, System.Text.Encoding.Default))
查看更多
再贱就再见
3楼-- · 2020-04-10 04:53

You have commented out is what you should be using if you do not know the exact encoding of your source data. System.Text.Encoding.Default uses the encoding for the operating system's current ANSI code page and provides the best chance of a correct encoding. This should detect the current region settings/encoding and use those.

However, from MSDN the warning:

Different computers can use different encodings as the default, and the default encoding can even change on a single computer. Therefore, data streamed from one computer to another or even retrieved at different times on the same computer might be translated incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback to map unsupported characters to characters supported by the code page. For these two reasons, using the default encoding is generally not recommended. To ensure that encoded bytes are decoded properly, your application should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding, with a preamble. Another option is to use a higher-level protocol to ensure that the same format is used for encoding and decoding.

Despite this, in my experience with data coming from a number of different source and various different cultures, this is the one that provides the most consistent results out-of-the-box... Esp. for the case of diacritic marks which are turned to question marks when moving from ANSI to UTF8.

I hope this helps.

查看更多
登录 后发表回答