Markov Clustering

2019-06-22 16:32发布

问题:

I have two questions to be precise. Firstly, I would like to know if there is an easy way to adapt the Markov Clustering Algorithm so that I can specify in advance, how many clusters I would like to have at the end. If not, which similiar algorithm would you recommend?

And secondly how should be dealt with overlapping clusters in the Markov world?

回答1:

1). There is no easy way to adapt the MCL algorithm (note: its name is 'Markov cluster algorithm' without the 'ing'. Many people verbalise it as in 'doing Markov clustering', which is fine) to output a specified number of clusters. This is in my opinion, for 99.99% of the time a highly desirable feature. If I were to do what you want, I would generate 4 or 5 clusterings at different levels of granularity (say setting the MCL inflation parameter to 1.4, 2.0, 3.0, 4.0 and 6.0, but it could be worthwhile to do a few more and pick based on the distribution of cluster sizes), then unify them in a hierarchical clustering (the program 'clm close' can do that). After that one could traverse the tree and try to find an optimal clustering of the desired size. This obviously requires significant effort. I have done something similar but not quite the same in the past.

2). Overlapping clusterings produced by MCL are extremely rare, and always a result of symmetry in the input graph. The standard MCL implementation that most people use (from http://micans.org/mcl/) will remove overlap. This in my opinion is not a concern. Disclaimer: I authored MCL.