I will like to do a cartesian product between the nodes of a Graph. I want to build their distance matrix. Maybe this is not a very good approach, so, any suggestion is welcome.
This is my code, and it's not working, I don't have any warning nor exception, it just does not work. I think maybe is because I'm trying to make a cartesian product with the same RDD, but I don't know how to fix it, how to make a nested loop or something that can help me to compute this matrix.
val indexes1 = graph.vertices.map(_._1)
val indexes2 = graph.vertices.map(_._1)
val cartesian = indexes1.cartesian(indexes2).cache()
cartesian.map(pair => matrix.updated(pair._1, shortPathBetween(pair._1, pair._2)))
def shortPathBetween(v1:VertexId, v2:VertexId) : Int = {
val path = ShortestPaths.run(graph, Seq(v2))
val shortestPath = path.vertices.filter({case (vId, _ ) => vId == v1})
.first()
._2
.get(v2)
shortestPath.getOrElse(-1)
}
The way I would approach this, is using the pregel API. This allows for parallel traversing the graph from each node. If you keep track of the distances and update them while traversing with the edge weight you end up with vertices with distances to each (reachable) other vertex.
If you for example take this directed graph:
You can init this in Spark GraphX like this:
The pregel call takes 3 functions
vprog
to initialize each vertex with a message (in this case empty Map[VertexId, Double] to keep track of distances)sendMsg
an update step that is applied on each iteration (in this case updating the distances by adding the weight of the edge and returning an Iterator with messages to send out to the next iterationmergeMsg
to merge two messages (2 Map[VertexId, Double]s into 1, keeping shortest distance)In code this could look like:
Then run the pregel, collect the vertices and pivot the map to get a distance matrix.
The result will look like
Maybe there are other/better ways, but this seems computationally less intense than calculating shortest path between nodes as a cartesian product ;-)