Fastest way to take elementwise sum of two Lists

2020-02-28 10:36发布

问题:

I can do elementwise operation like sum using Zipped function. Let I have two Lists L1 and L2 as shown below

val L1 = List(1,2,3,4)
val L2 = List(5,6,7,8)

I can take element wise sum in following way

(L1,L2).zipped.map(_+_)

and result is

List(6, 8, 10, 12) 

as expected.

I am using Zipped function in my actual code but it takes too much time. In reality My List Size is more than 1000 and I have more than 1000 Lists and my algorithm is iterative where iterations could be up to one billion.

In code I have to do following stuff

list =( (L1,L2).zipped.map(_+_).map (_  * math.random) , L3).zipped.map(_+_)

size of L1,L2 and L3 is same. Moreover I have to execute my actual code on a cluster.

What is the fastest way to take elementwise sum of Lists in Scala?

回答1:

One option would be to use a Streaming implementation, taking advantage of the lazyness may increase the performance.

An example using LazyList (introduced in Scala 2.13).

def usingLazyList(l1: LazyList[Double], l2: LazyList[Double], l3: LazyList[Double]): LazyList[Double] =
  ((l1 zip l2) zip l3).map {
    case ((a, b), c) =>
      ((a + b) * math.random()) + c
  }

And an example using fs2.Stream (introduced by the fs2 library).

import fs2.Stream
import cats.effect.IO

def usingFs2Stream(s1: Stream[IO, Double], s2: Stream[IO, Double], s3: Stream[IO, Double]): Stream[IO, Double] =
  s1.zipWith(s2) {
    case (a, b) =>
      (a + b) * math.random()
  }.zipWith(s3) {
    case (acc, c) =>
      acc + c
  }

However, if those are still too slow, the best alternative would be to use plain arrays.

Here is an example using ArraySeq (introduced in Scala 2.13 too) which at least will preserve immutability. You may use raw arrays if you prefer but take care.
(if you want, you may also use the collections-parallel module to be even more performant)

import scala.collection.immutable.ArraySeq
import scala.collection.parallel.CollectionConverters._

def usingArraySeq(a1: ArraySeq[Double], a2: ArraySeq[Double], a3: ArraySeq[Double]): ArraySeq[Double] = {
  val length = a1.length

  val arr = Array.ofDim[Double](length)

  (0 until length).par.foreach { i =>
    arr(i) = ((a1(i) + a2(i)) * math.random()) + a3(i)
  }

  ArraySeq.unsafeWrapArray(arr)
}