i am making using of the jama package for finding the lsa . I was told to reduce the dimensionality and hence i have reduced it to 3 in this case and i reconstruct the matrix . But the resultant matrix is very different from the one i had given to the system
heres the code
a = new Matrix(termdoc); // get the matrix here
a = a.transpose() ; // since the matrix is in the form of doc * terms i transpose it
SingularValueDecomposition sv =new SingularValueDecomposition(a) ;
u = sv.getU();
v = sv.getV();
s = sv.getS();
uarray = u.getArray();
sarray = s.getArray();
varray = v.getArray();
sarray_mod = new double[3][3]; //reducing dimension
uarray_mod = new double[uarray.length][3];
varray_mod = new double[3][varray.length];
move(sarray,3,3,sarray_mod); // my method to move the contents
move(uarray,uarray.length,3,uarray_mod);
move(varray,3,varray.length,varray_mod);
e = new Matrix(uarray_mod);
f = new Matrix(sarray_mod);
g = new Matrix(varray_mod);
Matrix temp =e.times(f);
result = temp.times(g);
result = result.transpose();
results = result.getArray() ;
System.out.println(" The array after svd : \n");
print(results);// my method to print the array
private static void move(double[][] sarray2, int r, int c,
double[][] sarrayMod) {
// TODO Auto-generated method stub
for(int i=0;i<r;i++)
for(int t=0;t<c;t++)
sarrayMod[i][t]=sarray2[i][t];
}
A sample output with just 3 files of which two are the similar
0.25 0 0 0 0 0 0 0 0.25 0 0.25 0.25 0
0 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0 0.083 0.083 0.167 0.083
0.25 0 0 0 0 0 0 0 0.25 0 0.25 0.25 0
The array after svd :
0.225 0.029 0.029 0.029 0.029 0.029 0.029 0.029 0.225 0.029 0.253 0.282 0.029
-0.121 0.077 0.077 0.077 0.077 0.077 0.077 0.077 -0.121 0.077 -0.044 0.033 0.077
0.245 0.012 0.012 0.012 0.012 0.012 0.012 0.012 0.245 0.012 0.257 0.269 0.012
Go through the example Here
In the example, we take first 2 columns from U,S and V . And then we multiply them. It wont result to give you the same matrix but will enhance the performance in similarity.
If you have gone through the example, you will find that the similarity between user and human was in -ve. But after we performed SVD , similarity increased to a +ve value close to 1.
I think the way you are moving is correct. Just go through the example once.