I have a byte array read over a network connection that I need to transform into a String without any encoding, that is, simply by treating each byte as the low end of a character and leaving the high end zero. I also need to do the converse where I know that the high end of the character will always be zero.
Searching the web yields several similar questions that have all got responses indicating that the original data source must be changed. This is not an option so please don't suggest it.
This is trivial in C but Java appears to require me to write a conversion routine of my own that is likely to be very inefficient. Is there an easy way that I have missed?
Here is a sample code which will convert String
to byte array
and back to String
without encoding.
public class Test
{
public static void main(String[] args)
{
Test t = new Test();
t.Test();
}
public void Test()
{
String input = "Hèllo world";
byte[] inputBytes = GetBytes(input);
String output = GetString(inputBytes);
System.out.println(output);
}
public byte[] GetBytes(String str)
{
char[] chars = str.toCharArray();
byte[] bytes = new byte[chars.length * 2];
for (int i = 0; i < chars.length; i++)
{
bytes[i * 2] = (byte) (chars[i] >> 8);
bytes[i * 2 + 1] = (byte) chars[i];
}
return bytes;
}
public String GetString(byte[] bytes)
{
char[] chars = new char[bytes.length / 2];
char[] chars2 = new char[bytes.length / 2];
for (int i = 0; i < chars2.length; i++)
chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));
return new String(chars2);
}
}
No, you aren't missing anything. There is no easy way to do that because String
and char
are for text. You apparently don't want to handle your data as text—which would make complete sense if it isn't text. You could do it the hard way that you propose.
An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). ISO-8859-1 or IBM437 both qualify. (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way.
As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes.
This will convert a byte array to a String while only filling the upper 8 bits.
public static String stringFromBytes(byte byteData[]) {
char charData[] = new char[byteData.length];
for(int i = 0; i < charData.length; i++) {
charData[i] = (char) (((int) byteData[i]) & 0xFF);
}
return new String(charData);
}
The efficiency should be quite good. Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead.
Using deprecated constructor String(byte[] ascii, int hibyte)
String string = new String(byteArray, 0);
String is already encoded as Unicode/UTF-16. UTF-16 means that it can take up to 2 string "characters"(char
) to make one displayable character. What you really want is to use is:
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(myString);
to convert a String to an array of bytes. This does exactly what you did above except it is 10 times faster in performance. If you would like to cut the transmission data nearly in half, I would recommend converting it to UTF8 (ASCII is a subset of UTF8) - the format the internet uses 90% of the time, by calling:
byte[] bytes = Encoding.UTF8.GetBytes(myString);
To convert back to a string use:
String myString = Encoding.Unicode.GetString(bytes);
or
String myString = Encoding.UTF8.GetString(bytes);