We have a requirement to determine whether an incoming InputStream is a reference to an zip file or zip data. We do not have reference to the underlying source of the stream. We aim to copy the contents of this stream into an OutputStream directed at an alternate location.
I tried reading the stream using ZipInputStream and extracting a ZipEntry. The ZipEntry is null if the stream is a regular file - as expected - however, in checking for a ZipEntry I loose the initial couple of bytes from the stream. Hence, by the time I know that the stream is a regular stream, I have already lost initial data from the stream.
Any thoughts around how to check if the InputStream is an archive without data loss would be helpful.
Thanks.
You can check first bytes of stream for ZIP local header signature (PK 0x03 0x04), that would be enough for most cases. If you need more precision, you should take last ~100 bytes and check for central directory locator fields.
This is how I did it.
Using mark/reset to restore the stream if the GZIPInputStream detects incorrect zip format (throws the ZipException).
You have described a java.io.PushbackInputStream - in addition to
read()
, it has anunread(byte[])
which allows you push them bck to the front of the stream, and to re-read()
them again.It's in
java.io
since JDK1.0 (though I admit I haven't seen a use for it until today).It sounds a bit like a hack, but you could implement a proxy java.io.InputStream to sit between ZipInputStream and the stream you originally passed to ZipInputStream's constructor. Your proxy would stream to a buffer until you know whether it's a ZIP file or not. If not, then the buffer saves your day.
Assuming your original inputstream is not buffered, I would try wrapping the original stream in a BufferedInputStream, before wrapping that in a ZipInputStream to check. You can use "mark" and "reset" in the BufferedInputStream to return to the initial position in the stream, after your check.