I have noticed that the unzip facility in Java is extremely slow compared to using a native tool such as WinZip.
Is there a third party library available for Java that is more efficient?
Open Source is preferred.
Edit
Here is a speed comparison using the Java built-in solution vs 7zip.
I added buffered input/output streams in my original solution (thanks Jim, this did make a big difference).
Zip File size: 800K
Java Solution: 2.7 seconds
7Zip solution: 204 ms
Here is the modified code using the built-in Java decompression:
/** Unpacks the give zip file using the built in Java facilities for unzip. */
@SuppressWarnings("unchecked")
public final static void unpack(File zipFile, File rootDir) throws IOException
{
ZipFile zip = new ZipFile(zipFile);
Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip.entries();
while(entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
java.io.File f = new java.io.File(rootDir, entry.getName());
if (entry.isDirectory()) { // if its a directory, create it
continue;
}
if (!f.exists()) {
f.getParentFile().mkdirs();
f.createNewFile();
}
BufferedInputStream bis = new BufferedInputStream(zip.getInputStream(entry)); // get the input stream
BufferedOutputStream bos = new BufferedOutputStream(new java.io.FileOutputStream(f));
while (bis.available() > 0) { // write contents of 'is' to 'fos'
bos.write(bis.read());
}
bos.close();
bis.close();
}
}
The problem is not the unzipping, it's the inefficient way you write the unzipped data back to disk. My benchmarks show that using
InputStream is = zip.getInputStream(entry); // get the input stream
OutputStream os = new java.io.FileOutputStream(f);
byte[] buf = new byte[4096];
int r;
while ((r = is.read(buf)) != -1) {
os.write(buf, 0, r);
}
os.close();
is.close();
instead reduces the method's execution time by a factor of 5 (from 5 to 1 second for a 6 MB zip file).
The likely culprit is your use of bis.available()
. Aside from being incorrect (available returns the number of bytes until a call to read would block, not until the end of the stream), this bypasses the buffering provided by BufferedInputStream, requiring a native system call for every byte copied into the output file.
Note that wrapping in a BufferedStream is not necessary if you use the bulk read and write methods as I do above, and that the code to close the resources is not exception safe (if reading or writing fails for any reason, neither is
nor os
would be closed). Finally, if you have IOUtils in the class path, I recommend using their well tested IOUtils.copy
instead of rolling your own.
Make sure you are feeding the unzip method a BufferedInputStream in your Java application. If you have made the mistake of using an unbuffered input stream your IO performance is guaranteed to suck.
I have found an 'inelegant' solution. There is an open source utility 7zip (www.7-zip.org) that is free to use. You can download the command line version (http://www.7-zip.org/download.html). 7-zip is only supported on Windows, but it looks like this has been ported to other platforms (p7zip).
Obviously this solution is not ideal since it is platform specific and relies on an executable. However, the speed compared to doing the unzip in Java is incredible.
Here is the code for the utility function that I created to interface with this utility. There is room for improvement as the code below is Windows specific.
/** Unpacks the zipfile to the output directory. Note: this code relies on 7-zip
(specifically the cmd line version, 7za.exe). The exeDir specifies the location of the 7za.exe utility. */
public static void unpack(File zipFile, File outputDir, File exeDir) throws IOException, InterruptedException
{
if (!zipFile.exists()) throw new FileNotFoundException(zipFile.getAbsolutePath());
if (!exeDir.exists()) throw new FileNotFoundException(exeDir.getAbsolutePath());
if (!outputDir.exists()) outputDir.mkdirs();
String cmd = exeDir.getAbsolutePath() + "/7za.exe -y e " + zipFile.getAbsolutePath();
ProcessBuilder builder = new ProcessBuilder(new String[] { "cmd.exe", "/C", cmd });
builder.directory(outputDir);
Process p = builder.start();
int rc = p.waitFor();
if (rc != 0) {
log.severe("Util::unpack() 7za process did not complete normally. rc: " + rc);
}
}