GraphX - Retrieving all nodes from a path

2019-01-15 21:59发布

In GraphX, is there a way to retrieve all the nodes and edges that are on a path that are of a certain length?

More specifically, I would like to get all the 10-step paths from A to B. For each path, I would like to get the list of nodes and edges.

Thanks.

1条回答
Deceive 欺骗
2楼-- · 2019-01-15 22:53

Disclaimer: This is only intended to show GraphFrames path filtering capabilities.

Well, theoretically speaking it is possible. You can use GraphFrames patterns to find paths. Lets assume your data looks as follows:

import org.graphframes.GraphFrame

val nodes = "abcdefghij".map(c =>Tuple1(c.toString)).toDF("id")

val edges = Seq(
   // Long path
  ("a", "b"), ("b", "c"), ("c", "d"),  ("d", "e"), ("e", "f"),
  // and some random nodes
  ("g", "h"), ("i", "j"), ("j", "i")
).toDF("src", "dst")

val gf = GraphFrame(nodes, edges)

and you want to find all paths with at least 5 nodes.

You can construct following path pattern:

val path = (1 to 4).map(i => s"(n$i)-[e$i]->(n${i + 1})").mkString(";")
// (n1)-[e1]->(n2);(n2)-[e2]->(n3);(n3)-[e3]->(n4);(n4)-[e4]->(n5)

and filter expression to avoid cycles:

val expr = (1 to 5).map(i => s"n$i").combinations(2).map {
  case Seq(i, j) => col(i) !== col(j)
}.reduce(_ && _)

Finally quick check:

gf.find(path).where(expr).show
// +-----+---+---+-----+---+-----+---+-----+---+
// |   e1| n1| n2|   e2| n3|   e3| n4|   e4| n5|
// +-----+---+---+-----+---+-----+---+-----+---+
// |[a,b]|[a]|[b]|[b,c]|[c]|[c,d]|[d]|[d,e]|[e]|
// |[b,c]|[b]|[c]|[c,d]|[d]|[d,e]|[e]|[e,f]|[f]|
// +-----+---+---+-----+---+-----+---+-----+---+
查看更多
登录 后发表回答