How would I write a method to compress a Gzip string that does not contain a header file, and have it compress to exactly the way it was before I have decompressed it. The original compression is done in C#, and I am inflating in Ruby using the following method:
EDIT: basically, I would like the matching deflate method to this inflate:
def inflate(string)
zstream = Zlib::Inflate.new(-Zlib::MAX_WBITS)
buf = zstream.inflate(string)
zstream.finish
zstream.close
buf
end
Before decompressing, the string is:
"5\x891\n\xC30\x10\x04{\xBDb\xEB\xE0F&\x81\xA4\xCA3\xDC\xA81\xD2\x1A]\xA1\x13\xB1.\x100\xFEF\xDE\e\x19\x9Cb\x99Yf\xCA\xB3A\x1A,\x13\xB1\x96R\x15I\x96\x85+5\x12\xA2=\xF4:\xAFb\xB9\xD0$\xA2\xF1\xF5>\xDA\xD3\xB9\x9A\xA8f\xFC\xD8\xE6\xFD\x00\x7F\xEB{\f!Uk{\xCF,\x91\xDC\x1C\x10J\xC4\xF7z\xCA\xE8p9\xF8\xFF\xF7\x93\xDEw\xD9\x7F"
And after decompressing using inflate, it is:
"What is the common difference in this arithmetic sequence?\n\n\\indenttext{11, 15, 19,\\dots}\n\n\\emcee{\n \\mc \x964\n \\mc 2\n *\\mc 4\n \\mc 8\n \\mc 11\n }"
I've tried creating multiple deflate methods, but none that can get it back to the original. Thanks for your help!
EDIT: The original compression was done in .NET 2.0 using the following
byte[] compressedStringBytes = CompressGzipString(String);
and CompressGzipString does:
MemoryStream compressed = new MemoryStream();
DeflaterOutputStream zosCompressed = new DeflaterOutputStream(compressed, new Deflater(Deflater.BEST_COMPRESSION, true));
zosCompressed.Write(data, 0, data.Length);
If it's not possible to get it to the exact original, what would be the most standardized compression, by which I mean general and that would be able to be decompressed in the same way that the original was?
Different compressors, different versions of the same compressor, or the same version of the same compressor with different settings, can and often will produce different output for the same input, even if they all use the same compressed data format (e.g. deflate). The only thing guaranteed is that when you decompress, you get exactly the same thing back you started with. In fact, that's really all you need guaranteed. Why do you want exactly the same compressed stream?
As noted by Ron Warholic, you wouldn't even want to get back to the same compressed output from .NET's broken deflate implementation prior to .NET 4.5. Since .NET 2.0 used its own unique, broken, deflate implementation, you cannot duplicate it with ruby, which uses zlib.
Also as noted by Ron Warholic, ruby and .NET 4.5 or later both use zlib, and so should both produce the same compressed output with the same compression level selected. Though that is not assured forever, since a new version of zlib may produce different output, and one of ruby or .NET might update to it while the other does not. Also as noted below, you do not have direct control over the compression level with .NET's classes.
Any correct implementation of lossless compression and decompression will have this property. You will always get back to the exact original, regardless of how the compressed data may differ. There is no "most standardized compression".
Your
Zlib::Inflate.new(-Zlib::MAX_WBITS)
is expecting a raw deflate stream, with no header or trailer. So you would need to produce that on the C# side.It is not clear from the .NET documentation whether the
DeflateStream
class compresses to the deflate format or the zlib format (where the latter is the deflate format with a zlib wrapper, consisting of two prefix bytes and four postfix bytes for data integrity checking). If it compresses to the deflate format, then it will be compatible with yourZlib::Inflate.new(-Zlib::MAX_WBITS)
. If it compresses to the zlib format, then it would be compatible withZlib::Inflate.new(Zlib::MAX_WBITS)
(i.e. without the minus sign). Or you can delete the first two bytes and last four bytes to get back to a deflate stream.The
DeflateStream
class in .NET is a little odd in that itsCompressionLevel
is anenum
with only three options, instead of the ten levels provided by zlib (0..9). The three options areOptimal
,Fastest
, andNoCompression
. The last must be 0, the first is probably 9, and the middle one might be 1 or 3. In any case, there is no option for the default compression level! That level (6) is a very good balance of compression vs. time.You might want to consider using DotNetZip instead. It provides a complete interface to zlib, so that you can specify exactly what you want to do, and know what will happen.
It depends how it was compressed in C# really; before .NET 4.5 the
System.IO.DeflateStream/GZipStream
class in C# used a Microsoft implementation of DEFLATE that differed significantly from zlib (which means you probably can't emulate it easily with zlib). It was much worse in almost all cases so in .NET 4.5 they replaced it with zlib which should be able to match what you can do in Ruby.If you know what version of C# generated the string you can determine whether or not you can get back to the original bytes. If it was generated with .NET 4.5 you should be able to do a standard deflate with the same settings to get the same bytes.