I have a situation where I need to know the size of a String
/encoding pair, in bytes, but cannot use the getBytes()
method because 1) the String
is very large and duplicating the String
in a byte[]
array would use a large amount of memory, but more to the point 2) getBytes()
allocates a byte[]
array based on the length of the String
* the maximum possible bytes per character. So if I have a String
with 1.5B characters and UTF-16 encoding, getBytes()
will try to allocate a 3GB array and fail, since arrays are limited to 2^32 - X bytes (X is Java version specific).
So - is there some way to calculate the byte size of a String
/encoding pair directly from the String
object?
UPDATE:
Here's a working implementation of jtahlborn's answer:
private class CountingOutputStream extends OutputStream {
int total;
@Override
public void write(int i) {
throw new RuntimeException("don't use");
}
@Override
public void write(byte[] b) {
total += b.length;
}
@Override public void write(byte[] b, int offset, int len) {
total += len;
}
}
Simple, just write it to a dummy output stream:
it's not only simple, but probably just as fast as the other "complex" answers.
The same using apache-commons libraries:
Here's an apparently working implementation:
The output is:
In practice I'd increase
ENCODE_CHUNK
to 10MChars or so.Probably slightly less efficient than brettw's answer, but simpler to implement.
Ok, this is extremely gross. I admit that, but this stuff is hidden by the JVM, so we have to dig a little. And sweat a little.
First, we want the actual char[] that backs a String without making a copy. To do this we have to use reflection to get at the 'value' field:
Next you need to implement a subclass of
java.nio.ByteBuffer
. Something like:Ignore all of the getters, implement all of the put methods like
put(byte)
andputChar(char)
etc. Inside something likeput(byte)
, increment length by 1, inside ofput(byte[])
increment length by the array length. Get it? Everything that is put, you add the size of whatever it is to length. But you're not storing anything in yourByteBuffer
, you're just counting and throwing away, so no space is taken. If you breakpoint theput
methods, you can probably figure out which ones you actually need to implement.putFloat(float)
is probably not used, for example.Now for the grand finale, putting it all together: