GraphX - Retrieving all nodes from a path

2019-01-15 22:28发布

问题:

In GraphX, is there a way to retrieve all the nodes and edges that are on a path that are of a certain length?

More specifically, I would like to get all the 10-step paths from A to B. For each path, I would like to get the list of nodes and edges.

Thanks.

回答1:

Disclaimer: This is only intended to show GraphFrames path filtering capabilities.

Well, theoretically speaking it is possible. You can use GraphFrames patterns to find paths. Lets assume your data looks as follows:

import org.graphframes.GraphFrame

val nodes = "abcdefghij".map(c =>Tuple1(c.toString)).toDF("id")

val edges = Seq(
   // Long path
  ("a", "b"), ("b", "c"), ("c", "d"),  ("d", "e"), ("e", "f"),
  // and some random nodes
  ("g", "h"), ("i", "j"), ("j", "i")
).toDF("src", "dst")

val gf = GraphFrame(nodes, edges)

and you want to find all paths with at least 5 nodes.

You can construct following path pattern:

val path = (1 to 4).map(i => s"(n$i)-[e$i]->(n${i + 1})").mkString(";")
// (n1)-[e1]->(n2);(n2)-[e2]->(n3);(n3)-[e3]->(n4);(n4)-[e4]->(n5)

and filter expression to avoid cycles:

val expr = (1 to 5).map(i => s"n$i").combinations(2).map {
  case Seq(i, j) => col(i) !== col(j)
}.reduce(_ && _)

Finally quick check:

gf.find(path).where(expr).show
// +-----+---+---+-----+---+-----+---+-----+---+
// |   e1| n1| n2|   e2| n3|   e3| n4|   e4| n5|
// +-----+---+---+-----+---+-----+---+-----+---+
// |[a,b]|[a]|[b]|[b,c]|[c]|[c,d]|[d]|[d,e]|[e]|
// |[b,c]|[b]|[c]|[c,d]|[d]|[d,e]|[e]|[e,f]|[f]|
// +-----+---+---+-----+---+-----+---+-----+---+