spectral clustering

2019-04-08 01:10发布

问题:

First off I must say that I'm new to matlab (and to this site...) , so please excuse my ignorance.

I'm trying to write a function in matlab that will use Spectral Clustering to split a set of points into two clusters.

my code is as follows

function Groups = TrySpectralClustering(data)
dist_mat = squareform(pdist(data));

W=  zeros(length(data),length(data));

for i=1:length(data),
    for j=(i+1):length(data),
    W(i,j)=10^(-dist_mat(i,j));
    W(j,i)=W(i,j);
    end
end
D = zeros(length(data),length(data));
for i=1:length(W),
D(i,i)=sum(W(i,:));
end
L=D-W;
L=D^(-0.5)*L*D^(-0.5);
[ V E ] = eig(L);
disp ('V:');
disp (V);

If I understand correctly, then by using the second smallest eigenvector I should be able to perform a partition of the data into two clusters - If the ith member of the 2nd eigenvector is positive, the ith data point would be in the one cluster, otherwise it would be in the other cluster.

However, when I try the following

f=[1,1;0,0;1,0;0,1;100,100;100,101;101,101;101,100]
TrySpectralClustering(f)

I would expect that the first four points would form one cluster, and the last four would form another.

However, I receive

V:
   -0.0000   -0.5000    0.0000   -0.5777    0.0000    0.4078   -0.0000    0.5000
   -0.0000   -0.5000    0.0000    0.5777    0.0000   -0.4078   -0.0000    0.5000
   -0.0000   -0.5000    0.0000    0.4078    0.0000    0.5777   -0.0000   -0.5000
   -0.0000   -0.5000    0.0000   -0.4078    0.0000   -0.5777   -0.0000   -0.5000
   -0.5000   -0.0000   -0.0000   -0.0000   -0.7071   -0.0000    0.5000   -0.0000
   -0.5000   -0.0000    0.7071    0.0000   -0.0000   -0.0000   -0.5000   -0.0000
   -0.5000    0.0000   -0.0000    0.0000    0.7071    0.0000    0.5000    0.0000
   -0.5000         0   -0.7071         0         0         0   -0.5000         0

Taking the 2nd eigenvector

  -0.0000   -0.5000    0.0000    0.5777    0.0000   -0.4078   -0.0000    0.5000

I find the one cluster includes the points 1,0;0,1;100,100;101,100 and the other cluster is made from the points 1,1;0,0;100,101;101,101

I wonder what am I doing wrong.

Note: I am working on the above as a part of a homework project.

Thanks in advance!

回答1:

What you are getting is correct. Let U be the matrix containing the eigenvectors as shown above and let them be arranged such that the 1st column corresponds to the smallest eigenvalue and progressive columns correspond to the ascending eigenvalues. Then, take a subset of columns of U by retaining the eigenvectors corresponding to the smaller eigenvalues. Now, read these columns row-wise into a new set of vectors, call it Y. Cluster Y to get the spectral clusters. So, let us assume our subset is only the first column. We clearly see that if u were to cluster the first column, u would get the first 4 into 1 cluster and the next 4 into another cluster, which is what you want.



回答2:

Take a look at the implementation on Prof. J. Shi's webpage. Pay close attention to discretisation.m function.

Moreover, your code is very inefficient. You need to take more advantage of Matlab's vectorization:

W = 10.^( - dist_mat ); % single liner of nested loop for comuting W
% computing the symmetric laplacian
d = sum( W, 2 ); % sum each row
d( d == 0 ) = 1; % avoid division by zero
d_half = 1./sqrt( d );
L = eye( n ) - bsxfun( @times, bsxfun( @times, W, d_half' ),  d_half );


回答3:

Two observations:

  1. L=D-W; L=D^(-0.5)*L*D^(-0.5); Why do you let him calculate the identity matrix? Just use the identity matrix eye(n) and substract D^(-0.5) * W * D^(-0.5) from that to calculate the Laplacian L

  2. eig returns the eigenvectors as columns, why do you take the row? Did you check the values of the corresponding eigenvalues in E, so you can be sure you are looking at a eigenvec corresponding to the 2nd smallest eigenval?