In GraphX, is there a way to retrieve all the nodes and edges that are on a path that are of a certain length?
More specifically, I would like to get all the 10-step paths from A to B.
For each path, I would like to get the list of nodes and edges.
Thanks.
Disclaimer: This is only intended to show GraphFrames path filtering capabilities.
Well, theoretically speaking it is possible. You can use GraphFrames patterns to find paths. Lets assume your data looks as follows:
import org.graphframes.GraphFrame
val nodes = "abcdefghij".map(c =>Tuple1(c.toString)).toDF("id")
val edges = Seq(
// Long path
("a", "b"), ("b", "c"), ("c", "d"), ("d", "e"), ("e", "f"),
// and some random nodes
("g", "h"), ("i", "j"), ("j", "i")
).toDF("src", "dst")
val gf = GraphFrame(nodes, edges)
and you want to find all paths with at least 5 nodes.
You can construct following path pattern:
val path = (1 to 4).map(i => s"(n$i)-[e$i]->(n${i + 1})").mkString(";")
// (n1)-[e1]->(n2);(n2)-[e2]->(n3);(n3)-[e3]->(n4);(n4)-[e4]->(n5)
and filter expression to avoid cycles:
val expr = (1 to 5).map(i => s"n$i").combinations(2).map {
case Seq(i, j) => col(i) !== col(j)
}.reduce(_ && _)
Finally quick check:
gf.find(path).where(expr).show
// +-----+---+---+-----+---+-----+---+-----+---+
// | e1| n1| n2| e2| n3| e3| n4| e4| n5|
// +-----+---+---+-----+---+-----+---+-----+---+
// |[a,b]|[a]|[b]|[b,c]|[c]|[c,d]|[d]|[d,e]|[e]|
// |[b,c]|[b]|[c]|[c,d]|[d]|[d,e]|[e]|[e,f]|[f]|
// +-----+---+---+-----+---+-----+---+-----+---+