I want to reduce a RDD[Array[Double]] in order to each element of the array will be add with the same element of the next array. I use this code for the moment :
var rdd1 = RDD[Array[Double]]
var coord = rdd1.reduce( (x,y) => { (x, y).zipped.map(_+_) })
Is there a better way to make this more efficiently because it cost a harm.
I assume the concern is that you have very large Array[Double] and the transformation as written does not distribute the addition of them. If so, you could do something like (untested):
Using zipped.map is very inefficient, because it creates a lot of temporary objects and boxes the doubles.
If you use spire, you can just do this
This is much nicer to look at, and should also be much more efficient.
Spire is a dependency of spark, so you should be able to do the above without any extra dependencies. At least it worked with a spark-shell for spark 1.3.1 here.
This will work for any array where there is an AdditiveSemigroup typeclass instance available for the element type. In this case, the element type is Double. Spire typeclasses are @specialized for double, so there will be no boxing going on anywhere.
If you really want to know what is going on to make this work, you have to use reify:
So the addition works because there is an instance of AdditiveSemigroup for Array[Double].