C# UTF8 Decoding, returning bytes/numbers instead

2019-07-19 16:33发布

问题:

I've having an issue decoding a file using an UTF8Encoder.

I am reading text from a file which I have encoded with UTF8 (String > Byte) See the following method.

public static void Encode(string Path)
    {
        string text;
        Byte[] bytes;
        using (StreamReader sr = new StreamReader(Path))
        {
            text = sr.ReadToEnd();
            UTF8Encoding Encoding = new UTF8Encoding();
            bytes = Encoding.GetBytes(text);
            sr.Close();
        }
        using (StreamWriter sw = new StreamWriter(Path))
        {
            foreach (byte b in bytes)
                sw.Write(b.ToString());
            sw.Close();
        }
    }

I then decode it using the method

    public static String Decode(string Path)
    {
        String text;
        Byte[] bytes;
        using (StreamReader sr = new StreamReader(Path))
        {
            text = sr.ReadToEnd();
            UTF8Encoding Encoding = new UTF8Encoding();
            bytes = Encoding.GetBytes(text);
            text = Encoding.GetString(bytes);
            return text;
        }
    }

But instead of decoding the byte to have it come back to text, it just returns it as a string of numbers. I can't see what I am doing wrong as I don't really have much experience with this.

EDIT: To clarify what I'm trying to achieve. I'm trying to have a text file save the text as bytes, rather than chars/numbers. This is to provide a very simple encryption to the files, that so you can't modify them, unless you know what you're doing. The Decode function is then used to read the text (bytes) from the file and make them in to readable text. I hope this clarified what I'm trying to achieve.

PS: Sry for no comments, but I think it's short enough to be understandable

回答1:

What exactly are you trying to achieve? UTF-8 (and all other Encodings) is a method to converting strings to byte arrays (text to raw data) and vice versa. StreamReader and StreamWriter are used to read/write strings from/to files. No need to re-encode anything there. Just using reader.ReadToEnd() will return the correct string.

Your piece of code seems to attempt to write a file containing a list of numbers (as a readable, textual representation) corresponding to UTF-8 bytes of the given text. OK. Even though this is very strange idea (I hope you are not trying to do anything like “encryption” with that.), this is definitely possible, if that’s really what you want to do. But you need to separate the readable numbers somehow, e.g. by newlines, and parse it when reading them back:

public static void Encode(string path)
{
    byte[] bytes;
    using (var sr = new StreamReader(path))
    {
        var text = sr.ReadToEnd();
        bytes = Encoding.UTF8.GetBytes(text);
    }
    using (var sw = new StreamWriter(path))
    {
        foreach (byte b in bytes)
        {
            sw.WriteLine(b);
        }
    }
}

public static void Decode(string path)
{
    var data = new List<byte>();
    using (var sr = new StreamReader(path))
    {
        string line;
        while((line = sr.ReadLine()) != null)
            data.Add(Byte.Parse(line));
    }
    using (var sw = new StreamWriter(path))
    {
        sw.Write(Encoding.UTF8.GetString(data.ToArray()));
    }
}


回答2:

This code will decode encrypted string to text, it worked on my side.

public static String Decode(string Path)
    {
        String text;
        using (StreamReader sr = new StreamReader(Path))
        {
                text = st.ReadToEnd();
                byte[] bytes = Convert.FromBase64String(text);
                System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
                System.Text.Decoder decoder = encoder.GetDecoder();
                int count = decoder.GetCharCount(bytes, 0, bytes.Length);
                char[] arr = new char[count];
                decoder.GetChars(bytes, 0, bytes.Length, arr, 0);
                text= new string(arr);

                return text;
        }
    }


回答3:

The StreamReader class will handle decoding for you, so your Decode() method can be as simple as this:

public static string Decode(string path)
{
    // This StreamReader constructor defaults to UTF-8
    using (StreamReader reader = new StreamReader(path))
        return reader.ReadToEnd();
}

I'm not sure what your Encode() method is supposed to do, since the intent seems to be to read a file as UTF-8 and then write the text back to the exact same file as UTF-8. Something like this might make more sense:

public static void Encode(string path, string text)
{
    // This StreamWriter constructor defaults to UTF-8
    using (StreamWriter writer = new StreamWriter(path))
        writer.Write(text);
}