I have some strings that are roughly 10K characters each. There is plenty of repetition in them. They are serialized JSON objects. I'd like to easily compress them into a byte array, and uncompress them from a byte array.
How can I most easily do this? I'm looking for methods so I can do the following:
String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);
I made a library to solve the problem of compressing generic Strings (expecially short ones). It tries to compress the String using various algorithms (plain utf-8, 5bit encoding for latin letters, huffman encoding, gzip for long Strings) and chooses the one with the shortest result (in the worst case, it will choose the utf-8 encoding, so that you never risk to lose space).
I hope it may be useful, here's the link https://github.com/lithedream/lithestring
EDIT: I realized that your Strings are always "long", my library defaults on gzip for those sizes, I fear I cannot do better for you.
Peter Lawrey's answer can be improved a bit using this less complex code for the decompress function
You can try