Convert String to/from byte array without encoding

2019-01-28 07:18发布

问题:

I have a byte array read over a network connection that I need to transform into a String without any encoding, that is, simply by treating each byte as the low end of a character and leaving the high end zero. I also need to do the converse where I know that the high end of the character will always be zero.

Searching the web yields several similar questions that have all got responses indicating that the original data source must be changed. This is not an option so please don't suggest it.

This is trivial in C but Java appears to require me to write a conversion routine of my own that is likely to be very inefficient. Is there an easy way that I have missed?

回答1:

Here is a sample code which will convert String to byte array and back to String without encoding.

public class Test
{

    public static void main(String[] args)
    {
        Test t = new Test();
        t.Test();
    }

    public void Test()
    {
        String input = "Hèllo world";
        byte[] inputBytes = GetBytes(input);
        String output = GetString(inputBytes);
        System.out.println(output);
    }

    public byte[] GetBytes(String str)
    {
        char[] chars = str.toCharArray();
        byte[] bytes = new byte[chars.length * 2];
        for (int i = 0; i < chars.length; i++)
        {
            bytes[i * 2] = (byte) (chars[i] >> 8);
            bytes[i * 2 + 1] = (byte) chars[i];
        }

        return bytes;
    }

    public String GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.length / 2];
        char[] chars2 = new char[bytes.length / 2];
        for (int i = 0; i < chars2.length; i++)
            chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));

        return new String(chars2);

    }
}


回答2:

No, you aren't missing anything. There is no easy way to do that because String and char are for text. You apparently don't want to handle your data as text—which would make complete sense if it isn't text. You could do it the hard way that you propose.

An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). ISO-8859-1 or IBM437 both qualify. (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way.

As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes.



回答3:

This will convert a byte array to a String while only filling the upper 8 bits.

public static String stringFromBytes(byte byteData[]) {
    char charData[] = new char[byteData.length];
    for(int i = 0; i < charData.length; i++) {
        charData[i] = (char) (((int) byteData[i]) & 0xFF);
    }
    return new String(charData);
}

The efficiency should be quite good. Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead.



回答4:

Using deprecated constructor String(byte[] ascii, int hibyte)

String string = new String(byteArray, 0);


回答5:

String is already encoded as Unicode/UTF-16. UTF-16 means that it can take up to 2 string "characters"(char) to make one displayable character. What you really want is to use is:

byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString); 

to convert a String to an array of bytes. This does exactly what you did above except it is 10 times faster in performance. If you would like to cut the transmission data nearly in half, I would recommend converting it to UTF8 (ASCII is a subset of UTF8) - the format the internet uses 90% of the time, by calling:

byte[] bytes = Encoding.UTF8.GetBytes(myString);

To convert back to a string use:

String myString = Encoding.Unicode.GetString(bytes); 

or

String myString = Encoding.UTF8.GetString(bytes);