I have written some Scala code to perform an element-wise operation on a collection. Here I defined two methods that perform the same task. One method uses zip
and the other uses zipped
.
def ES (arr :Array[Double], arr1 :Array[Double]) :Array[Double] = arr.zip(arr1).map(x => x._1 + x._2)
def ES1(arr :Array[Double], arr1 :Array[Double]) :Array[Double] = (arr,arr1).zipped.map((x,y) => x + y)
To compare these two methods in terms of speed, I wrote the following code:
def fun (arr : Array[Double] , arr1 : Array[Double] , f :(Array[Double],Array[Double]) => Array[Double] , itr : Int) ={
val t0 = System.nanoTime()
for (i <- 1 to itr) {
f(arr,arr1)
}
val t1 = System.nanoTime()
println("Total Time Consumed:" + ((t1 - t0).toDouble / 1000000000).toDouble + "Seconds")
}
I call the fun
method and pass ES
and ES1
as below:
fun(Array.fill(10000)(math.random), Array.fill(10000)(math.random), ES , 100000)
fun(Array.fill(10000)(math.random), Array.fill(10000)(math.random), ES1, 100000)
The results show that the method named ES1
that uses zipped
is faster than method ES
that uses zip
.
Based on these observations, I have two questions.
Why is zipped
faster than zip
?
Is there any even faster way to do element-wise operations on a collection in Scala?
Consider
lazyZip
instead of
zip
Scala 2.13 added
lazyZip
in favour of.zipped
zipped
(and hencelazyZip
) is faster thanzip
because, as explained by Tim and Mike Allen,zip
followed bymap
will result in two separate transformations due to strictness, whilstzipped
followed bymap
will result in a single transformation executed in one go due to laziness.zipped
givesTuple2Zipped
, and analysingTuple2Zipped.map
,we see the two collections
coll1
andcoll2
are iterated over and on each iteration the functionf
passed tomap
is applied along the waywithout having to allocate and transform intermediary structures.
Applying Travis' benchmarking method, here is a comparison between new
lazyZip
and deprecatedzipped
wheregives
lazyZip
seems to perform a bit better thanzipped
onArraySeq
. Interestingly, notice significantly degraded performance when usinglazyZip
onArray
.To answer your second question:
The sad truth is that despite it's conciseness, improved productivity, and resilience to bugs that functional languages aren't necessarily the most performant - using higher order functions to define a projection to be executed against collections not free, and your tight loop highlights this. As others have pointed out, additional storage allocation for intermediate and final results will also have overhead.
If performance is critical, although by no means universal, in cases like yours you can unwind Scala's operations back into imperative equivalents in order to regain more direct control over memory usage and eliminating function calls.
In your specific example, the
zipped
sums can be performed imperatively by pre-allocating a fixed, mutable array of correct size (since zip stops when one of the collections runs out of elements), and then adding elements at the appropriate index together (since accessing array elements by ordinal index is a very fast operation).Adding a third function,
ES3
to your test suite:On my i7 I get the following response times:
Even more heineous would be to do direct in-place mutation of the shorter of the two arrays, which would obviously corrupt the contents of one of the arrays, and would only be done if the original array again wouldn't be needed:
But obviously, direct mutation of array elements isn't in the spirit of Scala.
You should always be cautious with performance measurement because of JIT compilation, but a likely reason is that
zipped
is lazy and extracts elements from the originalArray
vaules during themap
call, whereaszip
creates a newArray
object and then callsmap
on the new object.None of the other answers mention the primary reason for the difference in speed, which is that the
zipped
version avoids 10,000 tuple allocations. As a couple of the other answers do note, thezip
version involves an intermediate array, while thezipped
version doesn't, but allocating an array for 10,000 elements isn't what makes thezip
version so much worse—it's the 10,000 short-lived tuples that are being put into that array. These are represented by objects on the JVM, so you're doing a bunch of object allocations for things that you're immediately going to throw away.The rest of this answer just goes into a little more detail about how you can confirm this.
Better benchmarking
You really want to be using a framework like jmh to do any kind of benchmarking responsibly on the JVM, and even then the responsibly part is hard, although setting up jmh itself isn't too bad. If you have a
project/plugins.sbt
like this:And a
build.sbt
like this (I'm using 2.11.8 since you mention that's what you're using):Then you can write your benchmark like this:
And run it with
sbt "jmh:run -i 10 -wi 10 -f 2 -t 1 zipped_bench.ZippedBench"
:Which shows that the
zipped
version gets about 80% more throughput, which is probably more or less the same as your measurements.Measuring allocations
You can also ask jmh to measure allocations with
-prof gc
:…where
gc.alloc.rate.norm
is probably the most interesting part, showing that thezip
version is allocating over three times as much aszipped
.Imperative implementations
If I knew that this method was going to be called in extremely performance-sensitive contexts, I'd probably implement it like this:
Note that unlike the optimized version in one of the other answers, this uses
while
instead of afor
since thefor
will still desugar into Scala collections operations. We can compare this implementation (withWhile
), the other answer's optimized (but not in-place) implementation (withFor
), and the two original implementations:That's a really huge difference between the imperative and functional versions, and all of these method signatures are exactly identical and the implementations have the same semantics. It's not like the imperative implementations are using global state, etc. While the
zip
andzipped
versions are more readable, I personally don't think there's any sense in which the imperative versions are against the "spirit of Scala", and I wouldn't hesitate to use them myself.With tabulate
Update: I added a
tabulate
implementation to the benchmark based on a comment in another answer:It's much faster than the
zip
versions, although still much slower than the imperative ones:This is what I'd expect, since there's nothing inherently expensive about calling a function, and because accessing array elements by index is very cheap.