Finding root vertices in largeish dataset in igrap

2019-06-06 16:48发布

Suppose you have an graph that you've made from an edgelist, and there are a couple hundred vertices. What I'm looking to do is to identify the initial set of vertices from which all subsequent ones are related to (like a mother, or family tree).

This is a data set that represents 'ice islands', large tabular sheets of ice that break off from glaciers and float around the sea. The initial fractures represent the root nodes. The subsequent vertices are re-observations of these pieces that are either smaller (melted islands), or pieces that have broken off (so the source vertex has a network of two edges and goes on to form two new vertices).

Is there a piece of code or a function that can do this easily for me? If I add labels to my plot it's impossible to read. Most of the methods of manipulating root nodes that I've been able to find involve small sample data sets where you just arbitrarily name things in the graph, or use the vertex's actual name. My data is stuff coming from a huge established CSV with super long number-character names. It makes it difficult.

I'm also super new to coding and R is a nightmare for me to use. Please be gentle and use simple examples! I can attach my code if you think it helps, all my data is being pulled out from a server and I don't know if it will be very clear from your perspective.

Thanks.

1条回答
相关推荐>>
2楼-- · 2019-06-06 17:09

For any node, n, you can find the number of edges into the node using neighbors(g, n, mode="in"). A node is an initial vertex if it does not have any edges coming into it. So you can just test all of the nodes for how many edges enter the node and select those for which the answer is zero.

Here is a simple example graph:

library(igraph)
set.seed(2017)
g = erdos.renyi.game(12, 20, type="gnm", directed=TRUE)
plot(g)

Example Graph

Now we can find the root nodes.

which(sapply(sapply(V(g), 
    function(x) neighbors(g,x, mode="in")), length) == 0)
[1] 1 2

This says that nodes 1 and 2 are sources.

Since you say that you are a beginner, let me explain this just a little.

function(x) neighbors(g,x, mode="in") is a function that takes a node as an argument and uses neighbors to return a list of nodes y that have a link from y to x (the parents of x).

sapply(V(g), function(x) neighbors(g,x, mode="in")) applies that function to all of the nodes in the graph, and so gives a list of the parents for every node. We are interested in the nodes that have no parents so we want the nodes for which the length of this list is zero. Thus, we apply length to the list of parents and check which lengths are zero.

查看更多
登录 后发表回答