Checking if a stream is a zip file

2019-03-30 05:22发布

问题:

We have a requirement to determine whether an incoming InputStream is a reference to an zip file or zip data. We do not have reference to the underlying source of the stream. We aim to copy the contents of this stream into an OutputStream directed at an alternate location.

I tried reading the stream using ZipInputStream and extracting a ZipEntry. The ZipEntry is null if the stream is a regular file - as expected - however, in checking for a ZipEntry I loose the initial couple of bytes from the stream. Hence, by the time I know that the stream is a regular stream, I have already lost initial data from the stream.

Any thoughts around how to check if the InputStream is an archive without data loss would be helpful.

Thanks.

回答1:

Assuming your original inputstream is not buffered, I would try wrapping the original stream in a BufferedInputStream, before wrapping that in a ZipInputStream to check. You can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.



回答2:

This is how I did it.

Using mark/reset to restore the stream if the GZIPInputStream detects incorrect zip format (throws the ZipException).

/**
 * Wraps the input stream with GZIPInputStream if needed. 
 * @param inputStream
 * @return
 * @throws IOException
 */
private InputStream wrapIfZip(InputStream inputStream) throws IOException {
    if (!inputStream.markSupported()) {
        inputStream = new BufferedInputStream(inputStream);
    }
    inputStream.mark(1000);
    try {
        return new GZIPInputStream(inputStream);
    } catch (ZipException e) {
        inputStream.reset();
        return inputStream;
    }
}


回答3:

You can check first bytes of stream for ZIP local header signature (PK 0x03 0x04), that would be enough for most cases. If you need more precision, you should take last ~100 bytes and check for central directory locator fields.



回答4:

It sounds a bit like a hack, but you could implement a proxy java.io.InputStream to sit between ZipInputStream and the stream you originally passed to ZipInputStream's constructor. Your proxy would stream to a buffer until you know whether it's a ZIP file or not. If not, then the buffer saves your day.



回答5:

You have described a java.io.PushbackInputStream - in addition to read(), it has an unread(byte[]) which allows you push them bck to the front of the stream, and to re-read() them again.

It's in java.io since JDK1.0 (though I admit I haven't seen a use for it until today).



标签: java stream zip