I want to perform Singular Value Decomposition on a large (sparse) matrix. In order to choose the best(most accurate) library, I tried replicating the SVD example provided here using different Java and Python libraries. Strangely I am getting different results with each library.
Here's the original example matrix and it's decomposed (U S and VT) matrices:
A =2.0 0.0 8.0 6.0 0.0
1.0 6.0 0.0 1.0 7.0
5.0 0.0 7.0 4.0 0.0
7.0 0.0 8.0 5.0 0.0
0.0 10.0 0.0 0.0 7.0
U =-0.54 0.07 0.82 -0.11 0.12
-0.10 -0.59 -0.11 -0.79 -0.06
-0.53 0.06 -0.21 0.12 -0.81
-0.65 0.07 -0.51 0.06 0.56
-0.06 -0.80 0.09 0.59 0.04
VT =-0.46 0.02 -0.87 -0.00 0.17
-0.07 -0.76 0.06 0.60 0.23
-0.74 0.10 0.28 0.22 -0.56
-0.48 0.03 0.40 -0.33 0.70
-0.07 -0.64 -0.04 -0.69 -0.32
S (with the top three singular values) =
17.92 0 0
0 15.17 0
0 0 3.56
I tried using the following Java and Python libraries: Java: PColt, Jama Python: NumPy
Here are the results from each one of them:
Jama:
U = 0.5423 -0.065 -0.8216 0.1057 -0.1245
0.1018 0.5935 0.1126 0.7881 0.0603
0.525 -0.0594 0.213 -0.1157 0.8137
0.6449 -0.0704 0.5087 -0.0599 -0.5628
0.0645 0.7969 -0.09 -0.5922 -0.0441
VT =0.4646 -0.0215 0.8685 8.0E-4 -0.1713
0.0701 0.76 -0.0631 -0.6013 -0.2278
0.7351 -0.0988 -0.284 -0.2235 0.565
0.4844 -0.0254 -0.3989 0.3327 -0.7035
0.065 0.6415 0.0443 0.6912 0.3233
S = 17.9184 0.0 0.0 0.0 0.0
0.0 15.1714 0.0 0.0 0.0
0.0 0.0 3.564 0.0 0.0
0.0 0.0 0.0 1.9842 0.0
0.0 0.0 0.0 0.0 0.3496
PColt:
U = -0.542255 0.0649957 0.821617 0.105747 -0.124490
-0.101812 -0.593461 -0.112552 0.788123 0.0602700
-0.524953 0.0593817 -0.212969 -0.115742 0.813724
-0.644870 0.0704063 -0.508744 -0.0599027 -0.562829
-0.0644952 -0.796930 0.0900097 -0.592195 -0.0441263
VT =-0.464617 0.0215065 -0.868509 0.000799554 -0.171349
-0.0700860 -0.759988 0.0630715 -0.601346 -0.227841
-0.735094 0.0987971 0.284009 -0.223485 0.565040
-0.484392 0.0254474 0.398866 0.332684 -0.703523
-0.0649698 -0.641520 -0.0442743 0.691201 0.323284
S =
(00) 17.91837085874625
(11) 15.17137188041607
(22) 3.5640020352605677
(33) 1.9842281528992616
(44) 0.3495556671751232
Numpy
U = -0.54225536 0.06499573 0.82161708 0.10574661 -0.12448979
-0.10181247 -0.59346055 -0.11255162 0.78812338 0.06026999
-0.52495325 0.05938171 -0.21296861 -0.11574223 0.81372354
-0.64487038 0.07040626 -0.50874368 -0.05990271 -0.56282918
-0.06449519 -0.79692967 0.09000966 -0.59219473 -0.04412631
VT =-4.64617e-01 2.15065e-02 -8.68508e-01 7.99553e-04 -1.71349e-01
-7.00859e-02 -7.59987e-01 6.30714e-02 -6.01345e-01 -2.27841e-01
-7.35093e-01 9.87971e-02 2.84008e-01 -2.23484e-01 5.65040e-01
-4.84391e-01 2.54473e-02 3.98865e-01 3.32683e-01 -7.03523e-01
-6.49698e-02 -6.41519e-01 -4.42743e-02 6.91201e-01 3.23283e-01
S = 17.91837086 15.17137188 3.56400204 1.98422815 0.34955567
As can be noticed the sign of each element in the Jama decomposed matrices (u & VT) is opposite to the ones in the original example. Interestingly, for PColt and Numpy only the signs of the elements in the last two columns are inverted. Is there any specific reason behind the inverted signs? Has someone faced similar discrepancies?
Here are the pieces of code which I used: Java
import java.text.DecimalFormat;
import cern.colt.matrix.tdouble.DoubleMatrix2D;
import cern.colt.matrix.tdouble.algo.DenseDoubleAlgebra;
import cern.colt.matrix.tdouble.algo.decomposition.DenseDoubleSingularValueDecomposition;
import cern.colt.matrix.tdouble.impl.DenseDoubleMatrix2D;
import Jama.Matrix;
import Jama.SingularValueDecomposition;
public class SVD_Test implements java.io.Serializable{
public static void main(String[] args)
{
double[][] data2 = new double[][]
{{ 2.0, 0.0, 8.0, 6.0, 0.0},
{ 1.0, 6.0, 0.0, 1.0, 7.0},
{ 5.0, 0.0, 7.0, 4.0, 0.0},
{ 7.0, 0.0, 8.0, 5.0, 0.0},
{ 0.0, 10.0, 0.0, 0.0, 7.0}};
DoubleMatrix2D pColt_matrix = new DenseDoubleMatrix2D(5,5);
pColt_matrix.assign(data2);
Matrix j = new Matrix(data2);
SingularValueDecomposition svd_jama = j.svd();
DenseDoubleSingularValueDecomposition svd_pColt = new DenseDoubleSingularValueDecomposition(pColt_matrix, true, true);
System.out.println("U:");
System.out.println("pColt:");
System.out.println(svd_pColt.getU());
printJamaMatrix(svd_jama.getU());
System.out.println("S:");
System.out.println("pColt:");
System.out.println(svd_pColt.getS());
printJamaMatrix(svd_jama.getS());
System.out.println("V:");
System.out.println("pColt:");
System.out.println(svd_pColt.getV());
printJamaMatrix(svd_jama.getV());
}
public static void printJamaMatrix(Matrix inp){
System.out.println("Jama: ");
System.out.println(String.valueOf(inp.getRowDimension())+" X "+String.valueOf(inp.getColumnDimension()));
DecimalFormat twoDForm = new DecimalFormat("#.####");
StringBuffer sb = new StringBuffer();
for (int r = 0; r < inp.getRowDimension(); r++) {
for (int c = 0; c < inp.getColumnDimension(); c++)
sb.append(Double.valueOf(twoDForm.format(inp.get(r, c)))).append("\t");
sb.append("\n");
}
System.out.println(sb.toString());
}
}
Python:
>>> import numpy
>>> numpy_matrix = numpy.array([[ 2.0, 0.0, 8.0, 6.0, 0.0],
[1.0, 6.0, 0.0, 1.0, 7.0],
[5.0, 0.0, 7.0, 4.0, 0.0],
[7.0, 0.0, 8.0, 5.0, 0.0],
[0.0, 10.0, 0.0, 0.0, 7.0]])
>>> u,s,v = numpy.linalg.svd(numpy_matrix, full_matrices=True)
Is there something wrong with the code? .