Java String encoding

2019-02-26 20:04发布

问题:

What´s the difference between

"hello world".getBytes("UTF-8");

and

 Charset.forName("UTF-8").encode("hello world").array();

? The second code produces a byte array with 0-bytes at the end in most cases.

回答1:

Your second snippet uses ByteBuffer.array(), which just returns the array backing the ByteBuffer. That may well be longer than the content written to the ByteBuffer.

Basically, I would use the first approach if you want a byte[] from a String :) You could use other ways of dealing with the ByteBuffer to convert it to a byte[], but given that String.getBytes(Charset) is available and convenient, I'd just use that...

Sample code to retrieve the bytes from a ByteBuffer:

ByteBuffer buffer = Charset.forName("UTF-8").encode("hello world");
byte[] array = new byte[buffer.limit()];
buffer.get(array);
System.out.println(array.length); // 11
System.out.println(array[0]);     // 104 (encoded 'h')