Distributed BlockMatrix out of Spark Matrices

2019-08-09 07:21发布

问题:

How to make a distributed BlockMatrix out of Matrices (of the same size)?

For example, let A, B be two 2 by 2 mllib.linalg.Matrices as follows

import org.apache.spark.mllib.linalg.{Matrix, Matrices}
import org.apache.spark.mllib.linalg.distributed.BlockMatrix

val A: Matrix = Matrices.dense(2, 2, Array(1.0, 2.0, 3.0, 4.0))
val B: Matrix = Matrices.dense(2, 2, Array(5.0, 6.0, 7.0, 8.0))
val C = new BlockMatrix(???)

How can I first make an RDD[((Int, Int), Matrix)] from A, B and second a distributed BlockMatrix out of A, B?

I'd appreciate any comment or help in advance.

回答1:

You can construct the BlockMatrix by first creating the RDD[((Int, Int), Matrix)]

val blocks: RDD[((Int, Int), Matrix)] = sc.parallelize(Seq(((0, 0), A), ((0, 1), B))

and then converting it into a BlockMatrix.

val blockMatrix: BlockMatrix = new BlockMatrix(blocks, 2, 2)

This will give you a BlockMatrix, which has the form [A | B].