I'm using GZipStream to compress / decompress data. I chose this over DeflateStream since the documentation states that GZipStream also adds a CRC to detect corrupt data, which is another feature I wanted. My "positive" unit tests are working well in that I can compress some data, save the compressed byte array and then successfully decompress it again. The .NET GZipStream compress and decompress problem post helped me realize that I needed to close the GZipStream before accessing the compressed or decompressed data.
Next, I continued to write a "negative" unit test to be sure corrupt data could be detected. I had previously used the example for the GZipStream class from MSDN to compress a file, open the compressed file with a text editor, change a byte to corrupt it (as if opening it with a text editor wasn't bad enough!), save it and then decompress it to be sure that I got an InvalidDataException as expected.
When I wrote the unit test, I picked an arbitrary byte to corrupt (e.g., compressedDataBytes[50] = 0x99) and got an InvalidDataException. So far so good. I was curious, so I chose another byte, but to my surprise I did not get an exception. This may be okay (e.g., I coincidentally hit an unused byte in a block of data), so long as the data could still be recovered successfully. However, I didn't get the correct data back either!
To be sure "it wasn't me", I took the cleaned up code from the bottom of .NET GZipStream compress and decompress problem and modified it to sequentially corrupt each byte of the compressed data until it failed to decompress properly. Here's the changes (note that I'm using the Visual Studio 2010 test framework):
// successful compress / decompress example code from:
// https://stackoverflow.com/questions/1590846/net-gzipstream-compress-and-decompress-problem
[TestMethod]
public void Test_zipping_with_memorystream_and_corrupting_compressed_data()
{
const string sample = "This is a compression test of microsoft .net gzip compression method and decompression methods";
var encoding = new ASCIIEncoding();
var data = encoding.GetBytes(sample);
string sampleOut = null;
byte[] cmpData;
// Compress
using (var cmpStream = new MemoryStream())
{
using (var hgs = new GZipStream(cmpStream, CompressionMode.Compress))
{
hgs.Write(data, 0, data.Length);
}
cmpData = cmpStream.ToArray();
}
int corruptBytesNotDetected = 0;
// corrupt data byte by byte
for (var byteToCorrupt = 0; byteToCorrupt < cmpData.Length; byteToCorrupt++)
{
// corrupt the data
cmpData[byteToCorrupt]++;
using (var decomStream = new MemoryStream(cmpData))
{
using (var hgs = new GZipStream(decomStream, CompressionMode.Decompress))
{
using (var reader = new StreamReader(hgs))
{
try
{
sampleOut = reader.ReadToEnd();
// if we get here, the corrupt data was not detected by GZipStream
// ... okay so long as the correct data is extracted
corruptBytesNotDetected++;
var message = string.Format("ByteCorrupted = {0}, CorruptBytesNotDetected = {1}",
byteToCorrupt, corruptBytesNotDetected);
Assert.IsNotNull(sampleOut, message);
Assert.AreEqual(sample, sampleOut, message);
}
catch(InvalidDataException)
{
// data was corrupted, so we expect to get here
}
}
}
}
// restore the data
cmpData[byteToCorrupt]--;
}
}
When I run this test, I get:
Assert.AreEqual failed. Expected:<This is a compression test of microsoft .net gzip compression method and decompression methods>. Actual:<>. ByteCorrupted = 11, CorruptBytesNotDetected = 8
So, this means there were actually 7 cases where corrupting the data made no difference (the string was successfully recovered), but corrupting byte 11 neither threw an exception, nor recovered the data.
Am I missing something or doing soemthing wrong? Can anyone see why the corrupt compressed data is not being detected?