This question is about MLlib (Spark 1.2.1+).
What is the best way to manipulate local matrices (moderate size, under 100x100, so does not need to be distributed).
For instance, after computing the SVD of a dataset, I need to perform some matrix operation.
The RowMatrix
only provide a multiply function. The toBreeze method returns a DenseMatrix<Object>
but the API does not seem Java friendly:
public final <TT,B,That> That $plus(B b, UFunc.UImpl2<OpAdd$,TT,B,That> op)
In Spark+Java, how to do any of the following operations:
- transpose a matrix
- add/subtract two matrices
- crop a Matrix
- perform element-wise operations
- etc
Javadoc RowMatrix: https://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html
RDD<Vector> data = ...;
RowMatrix matrix = new RowMatrix(data);
SingularValueDecomposition<RowMatrix, Matrix> svd = matrix.computeSVD(15, true, 1e-9d);
RowMatrix U = svd.U();
Vector s = svd.s();
Matrix V = svd.V();
//Example 1: How to compute transpose(U)*matrix
//Example 2: How to compute transpose(U(:,1:k))*matrix
EDIT: Thanks for dlwh for pointing me in the right direction, the following solution works:
import no.uib.cipr.matrix.DenseMatrix;
// ...
RowMatrix U = svd.U();
DenseMatrix U_mtj = new DenseMatrix((int) U.numCols(), (int) U.numRows(), U.toBreeze().toArray$mcD$sp(), true);
// From there, matrix operations are available on U_mtj