This was an interview question I got asked by a recruiter, the problem is basically to calculate the shortest path of all node to every node, and my solution was the following
initiate all possible edges (without reverse A - B is the same as B-A)
Each node will be represent in the following (src, cost, current_list, dest) , the src and dest is basically all the possible edges we initiate earlier
Map:
for each edge you traverse, you duplicate your tuple and add the current
traversed node to the cost and current list.
if the node is the destination you annotate finish, if the the node is
in the current list, you annotate delete
Reduce:
Don't really need to do anything besides outputting finish and deleting
delete and let the other node go through the next round of map
And by outputting I mean for each src, dest pair only output the least cost
The recruiter say this is not efficient, I can see how this is not efficient since you are traversing combinatorialy, but the only alternative I can think of is if you have n node, then spawn n servers and do dijkstra for each node which the recruiter say is also wrong. Can someone give me some help on this problem?
Edit:
Ex. Triangle Graph
The edges are A-B, B-C, C-A with path cost of 1
Algorithm
- First we initiate all possible source destination pair keeping in mind that reverse of edge is not unique A-B, A-C, B-C (B-A, C-A, B-C is omitted)
for each source destination pair we have the following tuple
(src=A, cost=None, current_list=A, dest=B, annotate=continue)
(src=A, cost=None, current_list=A, dest=C, annotate=continue)
(src=B, cost=None, current_list=B, dest=C, annotate=continue)
Now we start the map reduce algorithm
for each tuple in the tuple list we initiate: for each neighbor of the node at the end of current_list if the next neighbor is already in the current_list set annotate = delete elif the next neighbor is the dest set annotate = finish add path cost to cost else duplicate the current node add neighbor to current_list add path cost to cost delete the current tuple
In our case
(src=A, cost=None, current_list=A, dest=B, annotate=continue)
=>
(src=A, cost=1, current_list=AB, dest=B, annotate=finish)
(src=A, cost=1, current_list=AC, dest=B, annotate=continue)
(src=A, cost=None, current_list=A, dest=C, annotate=continue)
=>
(src=A, cost=1, current_list=AC, dest=C, annotate=finish)
(src=A, cost=1, current_list=AB, dest=C, annotate=continue)
(src=B, cost=None, current_list=B, dest=C, annotate=continue)
=>
(src=B, cost=1, current_list=BC, dest=C, annotate=finish)
(src=B, cost=1, current_list=BA, dest=C, annotate=continue)
Reduce
Note: we reduce on src, dest pair, and use it as our key for every tuple in tuple list
if annotate == finish keep trace of min cost and delete tuple for each src dest pair that is not the current min then pass the current min as result elif annotate == delete delete the tuple else pass down to the next round of map
Map
Since we still have some tuple that have annotate = continue
(src=B, cost=1, current_list=BA, dest=C, annotate=continue)
=>
(src=B, cost=2, current_list=BAC, dest=C, annotate=finish)
(src=B, cost=2, current_list=BAB, dest=C, annotate=delete)
(src=A, cost=1, current_list=AC, dest=B, annotate=continue)
=>
(src=A, cost=2, current_list=ACB, dest=B, annotate=finish)
(src=A, cost=2, current_list=ACA, dest=B, annotate=delete)
(src=A, cost=1, current_list=AB, dest=C, annotate=continue)
=>
(src=A, cost=2, current_list=ABC, dest=C, annotate=finish)
(src=A, cost=2, current_list=ABA, dest=C, annotate=delete)
- Reduce
We have no continue tuples, now we just use the reduce to find the min for each src dest pair
The inner two loops of Floyd-Warshall are essentially matrix multiplication with addition replaced by min and multiplication replaced by addition. You can do matrix multiplication with a map-reduce, so you can implement Floyd Warshall with |V| map-reduces.
From the wikipedia page on Floyd-Warshall:
The inner two loops (
i
andj
, lines 7 to 11) are structurally the same as matrix multiplication, and you can adapt any "matrix multiplication on map-reduce" solution to perform this.The outer (
k
) loop becomes |V| map-reduces.I would like to propose the following approach - for finding of shortest paths in graph via map-reduce.
Lets start with tiny example, which will lead to intuition regarding further implementation of algorithm.
Imagine, that information about graph is stored in form of adjacency lists (with payload, which represent paths between corresponding nodes). For example:
A -> [ {B, "A-B"}, {C, "A-C"}, {D, "A-D"} ]
From given example - we can "infer" information about the following connections in graph:
1) Direct connections
A -> B
(path:"A-B"
)A -> C
(path:"A-C"
)A -> D
(path:"A-D"
)2) Transitive connections through node
A
B -> C
(path:"B-A-C"
)(where
path("B -> C") == reverse(path("A -> B")) + path("A -> C")
)C -> B
(path:"C-A-B"
)C -> D
(path:"C-A-D"
)D -> C
(path:"D-A-C"
)D -> B
(path:"D-A-B"
)B -> D
(path:"B-A-D"
)In other words: we just "mapped" one entry of adjacency list - to multiple pairs of mutually accessible nodes (for all of generated pairs - there exists a path).
Each pair of nodes, actually represents connection:
Source -> Target
.So, now, we can combine all pairs - which has the same source node:
Source -> [{Target 1, "Path-to-Target-1"}, {Target 2, "Path-to-target-2"}, ...]
Actually, after the combination - each source will be associated with a list of target nodes: list might contain duplicated target nodes (duplicated target nodes, just corresponds to different possible paths).
So, we just need to remove duplicates from list of target nodes (to keep only that target nodes, which corresponds to the shortest paths).
Two paragraphs from above - actually describes reduce step. The output of reduce step - is the same as input to map step.
So, finally - just repeat these map-reduce steps until convergence.