Converting char array into byte array and back aga

2019-01-13 20:48发布

问题:

I'm looking to convert a Java char array to a byte array without creating an intermediate String, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:

char[] password = "password".toCharArray();

byte[] passwordBytes1 = new byte[password.length*2];
ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);

byte[] passwordBytes2 = new byte[password.length*2];
for(int i=0; i<password.length; i++) {
    passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8); 
    passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF); 
}

String passwordAsString = new String(password);
String passwordBytes1AsString = new String(passwordBytes1);
String passwordBytes2AsString = new String(passwordBytes2);

System.out.println(passwordAsString);
System.out.println(passwordBytes1AsString);
System.out.println(passwordBytes2AsString);
assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));

The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work?

回答1:

The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000.



回答2:

Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:

 Charset latin1Charset = Charset.forName("ISO-8859-1"); 
 charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
 byteBuffer = latin1Charset.encode(charBuffer);                 // also decode from String

Aside:

java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:

 byteArray = ByteBuffer.array();  byteBuffer = ByteBuffer.wrap(byteArray);  
 byteBuffer.get(byteArray);       charBuffer.put(charArray);
 charArray = CharBuffer.array();  charBuffer = ByteBuffer.wrap(charArray);
 charBuffer.get(charArray);       charBuffer.put(charArray);


回答3:

Original Answer

    public byte[] charsToBytes(char[] chars){
        Charset charset = Charset.forName("UTF-8");
        ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
        return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
    }

    public char[] bytesToChars(byte[] bytes){
        Charset charset = Charset.forName("UTF-8");
        CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
        return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
    }

Edited to use StandardCharsets

public byte[] charsToBytes(char[] chars)
{
    final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
    return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}

public char[] bytesToChars(byte[] bytes)
{
    final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
    return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
}

Here is a JavaDoc page for StandardCharsets. Note this on the JavaDoc page:

These charsets are guaranteed to be available on every implementation of the Java platform.



回答4:

If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer(), which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding).

Use Charset.forName(charsetName), and then its encode or decode method, or the newEncoder /newDecoder.

When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).



回答5:

I would do is use a loop to convert to bytes and another to conver back to char.

char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
   bytes[i*2] = (byte) (chars[i] >> 8);
   bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++) 
   chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);


回答6:

You should make use of getBytes() instead of toCharArray()

Replace the line

char[] password = "password".toCharArray();

with

byte[] password = "password".getBytes();


回答7:

This is an extension to Peter Lawrey's answer. In order to backward (bytes-to-chars) conversion work correctly for the whole range of chars, the code should be as follows:

char[] chars = new char[bytes.length/2];
for (int i = 0; i < chars.length; i++) {
   chars[i] = (char) (((bytes[i*2] & 0xff) << 8) + (bytes[i*2+1] & 0xff));
}

We need to "unsign" bytes before using (& 0xff). Otherwise half of the all possible char values will not get back correctly. For instance, chars within [0x80..0xff] range will be affected.



回答8:

When you use GetBytes From a String in Java, The return result will depend on the default encode of your computer setting.(eg: StandardCharsetsUTF-8 or StandardCharsets.ISO_8859_1etc...).

So, whenever you want to getBytes from a String Object. Make sure to give a encode . like :

String sample = "abc";
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8);

Let check what has happened with the code. In java, the String named sample , is stored by Unicode. every char in String stored by 2 byte.

sample :  value: "abc"   in Memory(Hex):  00 61 00 62 00 63
        a -> 00 61
        b -> 00 62
        c -> 00 63

But, When we getBytes From a String, we have

Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8)
//result is : 61 62 63
//length: 3 bytes

Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_16BE)  
//result is : 00 61 00 62 00 63        
//length: 6 bytes

In order to get the oringle byte of the String. We can just read the Memory of the string and get Each byte of the String.Below is the sample Code:

public static byte[] charArray2ByteArray(char[] chars){
    int length = chars.length;
    byte[] result = new byte[length*2+2];
    int i = 0;
    for(int j = 0 ;j<chars.length;j++){
        result[i++] = (byte)( (chars[j] & 0xFF00) >> 8 );
        result[i++] = (byte)((chars[j] & 0x00FF)) ;
    }
    return result;
}

Usages:

String sample = "abc";
//First get the chars of the String,each char has two bytes(Java).
Char[] sample_chars = sample.toCharArray();
//Get the bytes
byte[] result = charArray2ByteArray(sample_chars).

//Back to String.
//Make sure we use UTF_16BE. Because we read the memory of Unicode of  
//the String from Left to right. That's the same reading 
//sequece of  UTF-16BE.
String sample_back= new String(result , StandardCharsets.UTF_16BE);