Clustering: Cluster validation

2019-03-04 02:25发布

问题:

I want to use some clustering method for large social network dataset. The problem is how to evaluate the clustering method. yes, I can use some external ,internal and relative cluster validation methods. I used Normalized mutual information(NMI) as external validation method for cluster validation based on synthetic data. I produced some synthetic dataset by producing 5 clusters with equal number of nodes and some strongly connected links inside each cluster and weak links between clusters to check the clustering method, Then I analysed the spectral clustering and modularity based community detection methods on this synthetic datasets. I use the clustering with the best NMI for my real world dataset and check the error(cost function) of my algorithm and the result was good. Is my testing method for my cost function is good? or I should also validate clusters of my real word clusters again?

Thanks.

回答1:

Try more than one measure.

There are a dozen cluster validation measures, and it's hard to predict which one is most appropriate for a problem. The differences between them are not really understood yet, so it's best if you consult more than one.

Also note that if you don't use a normalized measure, the baseline may be really high. So the measures are mostly useful to say "result A is more similar to result B than result C", but should not be taken as an absolute measure of quality. They are a relative measure of similarity.