Louvain community detection in R using igraph - fo

2019-07-23 06:28发布

问题:

I have a correlation matrix of scores that I would like to run community detection on using the Louvain method in igraph, in R. I converted the correlation matrix to a distance matrix using cor2dist, as below:

distancematrix <- cor2dist(correlationmatrix)

This gives a 400 x 400 matrix of distances from 0-2. I then made the list of edges (the distances) and vertices (each of the 400 individuals) using the below method from http://kateto.net/networks-r-igraph (section 3.1).

library(igraph)
test <- as.matrix(distancematrix)
mode(test) <- "numeric"
test2 <- graph.adjacency(test, mode = "undirected", weighted = TRUE, diag = TRUE)
E(test2)$weight
get.edgelist(test2)

From this I then wrote csv files of the 'from' and 'to' edge list, and corresponding weights:

edgeweights <-E(test2)$weight
write.csv(edgeweights, file = "edgeweights.csv")
fromtolist <- get.edgelist(test2)
write.csv(fromtolist, file = "fromtolist.csv")

From these two files I produced a .csv file called "nodes.csv" which simply had all the vertex IDs for the 400 individuals:

id
1
2
3
4
...
400

And a .csv file called "edges.csv", which detailed 'from' and 'to' between each node, and provided the weight (i.e. the distance measure) for each of these edges:

from    to   weight
1       2    0.99
1       3    1.20
1       4    1.48
...
399     400  0.70

I then tried to use this node and edge list to create an igraph object, and run louvain clustering in the following way:

nodes <- read.csv("nodes.csv", header = TRUE, as.is = TRUE)
edges <- read.csv("edges.csv", header = TRUE, as.is = TRUE)
clustergraph <- graph_from_data_frame(edges, directed = FALSE, vertices = nodes)
clusterlouvain <- cluster_louvain(clustergraph)

Unfortunately this did not do the louvain community detection correctly. I expected this to return around 2-4 different communities, which could be plotted similarly to here, but sizes(clusterlouvain) returned:

Community sizes
 1 
 400

indicating that all individuals were sorted into the same community. The clustering also ran immediately (i.e. with almost no computation time), which also makes me think it was not working correctly.

My question is: Can anyone suggest why the cluster_louvain method did not work and identified just one community? I think I must be specifying the distance matrix or edges/nodes incorrectly, or in some other way not giving the correct input to the cluster_louvain method. I am relatively new to R so would be very grateful for any advice. I have successfully used other methods of community detection on the same distance matrix (i.e. k-means) which identified 2-3 communities, but would like to understand what I have done wrong here.

I'm aware there are multiple other queries about using igraph in R, but I have not found one which explicitly specifies the input format of the edges and nodes (from a correlation matrix) to get the louvain community detection working correctly.

Thank you for any advice! I can provide further information if helpful.

回答1:

I believe that cluster_louvain did exactly what it should do with your data. The problem is your graph.Your code included the line get.edgelist(test2). That must produce a lot of output. Instead try, this

vcount(test2)
ecount(test2)

Since you say that your correlation matrix is 400x400, I expect that you will get that vcount gives 400 and ecount gives 79800 = 400 * 399 / 2. As you have constructed it, every node is directly connected to all other nodes. Of course there is only one big community.

I suspect that what you are trying to do is group variables that are correlated. If the correlation is near zero, the variables should be unconnected. What seems less clear is what to do with variables with correlation near -1. Do you want them to be connected or not? We can do it either way.

You do not provide any data, so I will illustrate with the Ionosphere data from the mlbench package. I will try to mimic your code pretty closely, but will change a few variable names. Also, for my purposes, it makes no sense to write the edges to a file and then read them back again, so I will just directly use the edges that are constructed.

First, assuming that you want variables with correlation near -1 to be connected.

library(igraph)
library(mlbench)    # for Ionosphere data
library(psych)      # for cor2dist
data(Ionosphere)

correlationmatrix = cor(Ionosphere[, which(sapply(Ionosphere, class) == 'numeric')])
distancematrix <- cor2dist(correlationmatrix)

DM1 <- as.matrix(distancematrix)
## Zero out connections where there is low (absolute) correlation
## Keeps connection for cor ~ -1
## You may wish to choose a different threshhold
DM1[abs(correlationmatrix) < 0.33] = 0

G1 <- graph.adjacency(DM1, mode = "undirected", weighted = TRUE, diag = TRUE)
vcount(G1)
[1] 32
ecount(G1)
[1] 140

Not a fully connected graph! Now let's find the communities.

clusterlouvain <- cluster_louvain(G1)
plot(G1, vertex.color=rainbow(3, alpha=0.6)[clusterlouvain$membership])

If instead, you do not want variables with negative correlation to be connected, just get rid of the absolute value above. This should be much less connected

DM2 <- as.matrix(distancematrix)
## Zero out connections where there is low correlation
DM2[correlationmatrix < 0.33] = 0

G2 <- graph.adjacency(DM2, mode = "undirected", weighted = TRUE, diag = TRUE)
clusterlouvain <- cluster_louvain(G2)
plot(G2, vertex.color=rainbow(4, alpha=0.6)[clusterlouvain$membership])