I'm trying to read a zip file, check that it has some required files, and then write all valid files out to another zip file. The basic introduction to java.util.zip has a lot of Java-isms and I'd love to make my code more Scala-native. Specifically, I'd like to avoid the use of vars
. Here's what I have:
val fos = new FileOutputStream("new.zip");
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos));
while (zipIn.available == 1) {
val entry = zipIn.getNextEntry
if (entryIsValid(entry)) {
zipOut.putNewEntry(new ZipEntry("subdir/" + entry.getName())
// read data into the data Array
var data = Array[Byte](1024)
var count = zipIn.read(data, 0, 1024)
while (count != -1) {
zipOut.write(data, 0, count)
count = zipIn.read(data, 0, 1024)
}
}
zipIn.close
}
zipOut.close
I should add that I'm using Scala 2.7.7.
Based on http://harrah.github.io/browse/samples/compiler/scala/tools/nsc/io/ZipArchive.scala.html:
Without tail-recursion, I'd avoid recursion. You would run the risk to a stack overflow. You could wrap
zipIn.read(data)
in anscala.BufferedIterator[Byte]
and go from there.dI don't think there's anything particularly wrong with using Java classes that are designed to work in imperative fashion in the fashion they were designed. Idiomatic Scala includes being able to use idiomatic Java as it was intended, even if the styles do clash a bit.
However, if you want--perhaps as an exercise, or perhaps because it does slightly clarify the logic--to do this in a more functional var-free way, you can do so. In 2.8, it's particularly nice, so even though you're using 2.7.7, I'll give a 2.8 answer.
First, we need to set up the problem, which you didn't entirely, but let's suppose we have something like this:
Now, given this we want to copy the zip file. The trick we can use is the
continually
method incollection.immutable.Stream
. What it does is perform a lazily-evaluated loop for you. You can then take and filter the results to terminate and process what you want. It's a handy pattern to use when you have something that you want to be an iterator, but it isn't. (If the item updates itself you can use.iterate
inIterable
orIterator
--that's usually even better.) Here's the application to this case, used twice: once to get the entries, and once to read/write chunks of data:Pay close attention to the
.
at the end of some lines! I would normally write this on one long line, but it's nicer to have it wrap so you can see it all here.Just in case it isn't clear, let's unpack one of the uses of
continually
.This asks to keep calling
zipIn.read(buffer)
for as many times as necessary, storing the integer that results.This specifies how many times are necessary, returning a stream of indefinite length but which will quit when it hits a
-1
.This processes the stream, taking each item in turn (the count), and using it to write the buffer. This works in a slightly sneaky way, since you rely upon the fact that
zipIn
has just been called to get the next element of the stream--if you tried to do this again, not on a single pass through the stream, it would fail becausebuffer
would be overwritten. But here it's okay.So, there it is: a slightly more compact, possibly easier to understand, possibly less easy to understand method that is more functional (though there are still side-effects galore). In 2.7.7, in contrast, I would actually do it the Java way because
Stream.continually
isn't available, and the overhead of building a customIterator
isn't worth it for this one case. (It would be worth it if I was going to do more zip file processing and could reuse the code, however.)Edit: The looking-for-available-to-go-zero method is kind of flaky for detecting the end of the zip file. I think the "correct" way is to wait until you get a
null
back fromgetNextEntry
. With that in mind, I've edited the previous code (there was atakeWhile(_ => zipIn.available==1)
that is now atakeWhile(_ != null)
) and provided a 2.7.7 iterator based version below (note how small the main loop is, once you get through the work of defining the iterators, which do admittedly use vars):I'd try something like this (yes, pretty much the same idea sblundy had):
It could be simplified like below, but I'm not very fond of it. I'd prefer for
read
not to be able to return 0...Using scala2.8 and tail recursive call :