I've been trying all day to make this thing works but it's still not right yet. I've checked so many posts around here and tested so many different implementations that I'dont know where to look now...
Here is my situation, I have a small php test file (gz.php) on my server wich looks like this :
header("Content-Encoding: gzip");
print("\x1f\x8b\x08\x00\x00\x00\x00\x00");
$contents = gzcompress("Is it working?", 9);
print($contents);
This is the simplest I could do and it works fine with any web browser.
Now I have an Android activity using Jsoup that has this code :
URL url = new URL("http://myServerAdress.com/gz.php");
doc = Jsoup.parse(url, 1000);
Which cause an empty EOFException on the "Jsoup.parse" line.
I've read everywhere that Jsoup is supposed to parse gzipped content without having to do anything special, but obviously, there's something missing.
I've tried many other ways like using Jsoup.connect().get() or InpuStream, GZipInputStream and DataInpuStream. I did try the gzDeflate() and gzencode() methods from PHP as well but no luck either. I even tried not to declare the header-encoding in PHP and try to deflate the content later...but it was as clever as effective...
It has to be something "stupid" I'm missing but I just can't tell what... anybody has an idea?
(ps : I'm using Jsoup 1.7.0, so the latest one as of now)
The asker indicated in a comment that gzcompress was writing a CRC that was both incorrect and incomplete, according to information from here, the operative code being:
// Display the header of the gzip file
// Thanks ck@medienkombinat.de!
// Only display this once
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
// Figure out the size and CRC of the original for later
$Size = strlen($contents);
$Crc = crc32($contents);
// Compress the data
$contents = gzcompress($contents, 9);
// We can't just output it here, since the CRC is messed up.
// If I try to "echo $contents" at this point, the compressed
// data is sent, but not completely. There are four bytes at
// the end that are a CRC. Three are sent. The last one is
// left in limbo. Also, if we "echo $contents", then the next
// byte we echo will not be sent to the client. I am not sure
// if this is a bug in 4.0.2 or not, but the best way to avoid
// this is to put the correct CRC at the end of the compressed
// data. (The one generated by gzcompress looks WAY wrong.)
// This will stop Opera from crashing, gunzip will work, and
// other browsers won't keep loading indefinately.
//
// Strip off the old CRC (it's there, but it won't be displayed
// all the way -- very odd)
$contents = substr($contents, 0, strlen($contents) - 4);
// Show only the compressed data
echo $contents;
// Output the CRC, then the size of the original
gzip_PrintFourChars($Crc);
gzip_PrintFourChars($Size);
Jonathan Hedley commented, "jsoup just uses a normal Java GZIPInputStream to parse the gzip, so you'd hit that issue with any Java program." The EOFException is presumably due to the incomplete CRC.