I am looking for a way to deserialize a String
from a byte[]
in Java with as little garbage produced as possible. Because I am creating my own serializer and de-serializer, I have complete freedom to implement any solution on the server-side (i.e. when serializing data), and on the client-side (i.e. when de-serializing data).
I have managed to efficiently serialize a String
without incurring any garbage overhead by iterating through the String's
chars (String.charAt(i)
) and converting each char
(16-bit value) to 2x 8-bit value. There is a nice debate regarding this here. An alternative is to use Reflection to access String's
underlying char[]
directly, but this in outside the scope of the problem.
However, it seems impossible for me to deserialize the byte[]
without creating the char[]
twice, which seems, well, weird.
The procedure:
- Create
char[]
- Iterate through
byte[]
and fill-in thechar[]
- Create String with
String(char[])
constructor
Because of Java's String
immutability rules, the constructor copies the char[], creating 2x GC overhead. I can always use mechanisms to circumvent this (Unsafe String
allocation + Reflection to set the char[]
instance), but I just wanted to ask if there are any consequences to this other than me breaking every convention on String's
immutability.
Of course, the wisest response to this would be "come on, stop doing this and have trust in GC, the original char[]
will be extremely short-lived and G1 will get rid of it momentarily", which actually makes sense, if the char[]
is smaller than 1/2 of the G1's region size. If it is larger, the char[] will be directly allocated as a humongous object (i.e. automatically propagated outside of the G1's region). Such objects are extremely hard to be efficiently garbage collected in G1. That's why each allocation matters.
Any ideas on how to tackle the issue?
Many thanks.