Gunzipping Contents of a URL - Java

2019-08-31 02:19发布

问题:

So as the title suggests, I'm trying to get and gunzip a string from an HTTP request.

urlConn = url.openConnection();
int len = CONTENT_LENGTH
byte[] gbytes = new byte[len];
gbuffer = new GZIPInputStream(urlConn.getInputStream(), len);
System.out.println(gbuffer.read(gbytes)+"/"+len);
System.out.println(gbytes);
result = new String(gbytes, "UTF-8");
gbuffer.close();
System.out.println(result);

With some URLs, it works fine. I get output like this:

42/42
[B@96e8209
The entire 42 bytes of my data. Abcdefghij.

With others, it gives me something like the following output:

22/77
[B@1d94882
The entire 77 bytes of

As you can see, the first some-odd bytes of data are very similar if not the same, so they shouldn't be causing these issues. I really can't seem to pin it down. Increasing CONTENT_LENGTH doesn't help, and data streams of sizes both larger and smaller than the ones giving me issues work fine.

EDIT: The issue also does not lie within the raw gzipped data, as Cocoa and Python both gunzip it without issue.

EDIT: Solved. Including final code:

urlConn = url.openConnection();
int offset = 0, len = CONTENT_LENGTH
byte[] gbytes = new byte[len];
gbuffer = new GZIPInputStream(urlConn.getInputStream(), len);
while(offset < len)
{
    offset += gbuffer.read(gbytes, offset, offset-len);
}
result = new String(gbytes, "UTF-8");
gbuffer.close();

回答1:

It's possible that the data isn't available in the stream. The first println() you have says you've only read 22 bytes, so only 22 bytes were available when you called read(). You can try looping until you've read CONTENT_LENGTH worth of bytes. Maybe something like:

int index = 0;
int bytesRead = gbuffer.read(gbytes);
while(bytesRead>0 && index<len) {
    index += bytesRead;
    bytesRead = gbuffer.read(gbytes,index,len-index);
}


回答2:

GZIPInputStream.read() is not guaranteed to read all data in one call. You should use a loop:

byte[] buf = new byte[1024];
int len = 0, total = 0;
while ((len = gbuffer.read(buf)) > 0) {
    total += len;
    // do something with data
}