How to read byte[] with current encoding using str

2020-07-27 04:41发布

问题:

I would like to read byte[] using C# with the current encoding of the file.

As written in MSDN the default encoding will be UTF-8 when the constructor has no encoding:

var reader = new StreamReader(new MemoryStream(data)).

I have also tried this, but still get the file as UTF-8:

var reader = new StreamReader(new MemoryStream(data),true)

I need to read the byte[] with the current encoding.

回答1:

A file has no encoding. A byte array has no encoding. A byte has no encoding. Encoding is something that transforms bytes to text and vice versa.

What you see in text editors and the like is actually program magic: The editor tries out different encodings an then guesses which one makes the most sense. This is also what you enable with the boolean parameter. If this does not produce what you want, then this magic fails.

var reader = new StreamReader(new MemoryStream(data), Encoding.Default);

will use the OS/Location specific default encoding. If that is still not what you want, then you need to be completely explicit, and tell the streamreader what exact encoding to use, for example (just as an example, you said you did not want UTF8):

var reader = new StreamReader(new MemoryStream(data), Encoding.UTF8);


回答2:

I just tried leveraging different way of trying to figure out the ByteEncoding and it is not possible to do so as the byte array does not have an encoding in place as Jan mentions in his reply. However you can always take the value and do the type conversion to UTF8 or ASCII/Unicode and test the string values in case you are doing a "Text.EncodingFormat.GetString(byte [] array)"

public static bool IsUnicode(string input)    
{    
    var asciiBytesCount = Encoding.ASCII.GetByteCount(input);
    var unicodBytesCount = Encoding.UTF8.GetByteCount(input);
    return asciiBytesCount != unicodBytesCount;
}