Given
- URL of an archive (e.g. a zip file)
- Full name (including path) of a file inside that archive
I'm looking for a way (preferably in Java) to create a local copy of that file, without downloading the entire archive first.
From my (limited) understanding it should be possible, though I have no idea how to do that. I've been using TrueZip, since it seems to support a large variety of archive types, but I have doubts about its ability to work in such a way. Does anyone have any experience with that sort of thing?
EDIT: being able to also do that with tarballs and zipped tarballs is also important for me.
Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the
Range
HTTP header for this to work.The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.
Since you're using TrueZip, I recommend implementing
de.schlichtherle.io.rof.ReadOnlyFile
using Apache HTTP Client and creating ade.schlichtherle.util.zip.ZipFile
with that.This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).
I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response:
Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a
URLConnection
to the archive, get its input stream, wrap it in aZipInputStream
, and repeatedly callgetNextEntry()
andcloseEntry()
to iterate through all the entries in the file until you reach the one you want. Then you can read its data usingZipInputStream.read(...)
.The Java code would look something like this:
This is, of course, untested.
Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows: