可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

This was an interview question I got asked by a recruiter, the problem is basically to calculate the shortest path of all node to every node, and my solution was the following

initiate all possible edges (without reverse A - B is the same as B-A)

Each node will be represent in the following (src, cost, current_list, dest) , the src and dest is basically all the possible edges we initiate earlier

Map:

for each edge you traverse, you duplicate your tuple and add the current   
traversed node to the cost and current list. 
if the node is the destination you annotate finish, if the the node is 
in the current list, you annotate delete

Reduce:

Don't really need to do anything besides outputting finish and deleting 
delete and let the other node go through the next round of map
And by outputting I mean for each src, dest pair only output the least cost

The recruiter say this is not efficient, I can see how this is not efficient since you are traversing combinatorialy, but the only alternative I can think of is if you have n node, then spawn n servers and do dijkstra for each node which the recruiter say is also wrong. Can someone give me some help on this problem?

Edit:

Ex. Triangle Graph

The edges are A-B, B-C, C-A with path cost of 1

Algorithm

First we initiate all possible source destination pair keeping in mind that reverse of edge is not unique A-B, A-C, B-C (B-A, C-A, B-C is omitted)

for each source destination pair we have the following tuple

(src=A, cost=None, current_list=A, dest=B, annotate=continue)
(src=A, cost=None, current_list=A, dest=C, annotate=continue)
(src=B, cost=None, current_list=B, dest=C, annotate=continue)

Now we start the map reduce algorithm

for each tuple in the tuple list we initiate:

    for each neighbor of the node at the end of current_list
        if the next neighbor is already in the current_list
            set annotate = delete
        elif the next neighbor is the dest
            set annotate = finish
            add path cost to cost
        else
            duplicate the current node
            add neighbor to current_list
            add path cost to cost
        delete the current tuple

In our case

(src=A, cost=None, current_list=A, dest=B, annotate=continue)
 =>
(src=A, cost=1, current_list=AB, dest=B, annotate=finish)
(src=A, cost=1, current_list=AC, dest=B, annotate=continue)

(src=A, cost=None, current_list=A, dest=C, annotate=continue)
=>
(src=A, cost=1, current_list=AC, dest=C, annotate=finish)
(src=A, cost=1, current_list=AB, dest=C, annotate=continue)

(src=B, cost=None, current_list=B, dest=C, annotate=continue)
=>
(src=B, cost=1, current_list=BC, dest=C, annotate=finish)
(src=B, cost=1, current_list=BA, dest=C, annotate=continue)

Reduce

Note: we reduce on src, dest pair, and use it as our key for every tuple in tuple list

if annotate == finish
    keep trace of min cost and delete tuple for each src dest pair that is not the current min
    then pass the current min as result
elif annotate == delete
    delete the tuple

else
    pass down to the next round of map

Since we still have some tuple that have annotate = continue

(src=B, cost=1, current_list=BA, dest=C, annotate=continue)  
=>
(src=B, cost=2, current_list=BAC, dest=C, annotate=finish)  
(src=B, cost=2, current_list=BAB, dest=C, annotate=delete)  


(src=A, cost=1, current_list=AC, dest=B, annotate=continue)
=>
(src=A, cost=2, current_list=ACB, dest=B, annotate=finish)
(src=A, cost=2, current_list=ACA, dest=B, annotate=delete)

(src=A, cost=1, current_list=AB, dest=C, annotate=continue)
=>
(src=A, cost=2, current_list=ABC, dest=C, annotate=finish)
(src=A, cost=2, current_list=ABA, dest=C, annotate=delete)

Reduce

We have no continue tuples, now we just use the reduce to find the min for each src dest pair

回答1:

The inner two loops of Floyd-Warshall are essentially matrix multiplication with addition replaced by min and multiplication replaced by addition. You can do matrix multiplication with a map-reduce, so you can implement Floyd Warshall with |V| map-reduces.

From the wikipedia page on Floyd-Warshall:

1 let dist be a |V| × |V| array of minimum distances initialized to ∞ (infinity)
2 for each vertex v
3    dist[v][v] ← 0
4 for each edge (u,v)
5    dist[u][v] ← w(u,v)  // the weight of the edge (u,v)
6 for k from 1 to |V|
7    for i from 1 to |V|
8       for j from 1 to |V|
9          if dist[i][j] > dist[i][k] + dist[k][j] 
10             dist[i][j] ← dist[i][k] + dist[k][j]
11         end if

The inner two loops (i and j, lines 7 to 11) are structurally the same as matrix multiplication, and you can adapt any "matrix multiplication on map-reduce" solution to perform this.

The outer (k) loop becomes |V| map-reduces.

回答2:

I would like to propose the following approach - for finding of shortest paths in graph via map-reduce.

Lets start with tiny example, which will lead to intuition regarding further implementation of algorithm.

Imagine, that information about graph is stored in form of adjacency lists (with payload, which represent paths between corresponding nodes). For example:

A -> [ {B, "A-B"}, {C, "A-C"}, {D, "A-D"} ]

From given example - we can "infer" information about the following connections in graph:

1) Direct connections

A -> B (path: "A-B")
A -> C (path: "A-C")
A -> D (path: "A-D")

2) Transitive connections through node A

B -> C (path: "B-A-C")

(where path("B -> C") == reverse(path("A -> B")) + path("A -> C"))
C -> B (path: "C-A-B")
C -> D (path: "C-A-D")
D -> C (path: "D-A-C")
D -> B (path: "D-A-B")
B -> D (path: "B-A-D")

In other words: we just "mapped" one entry of adjacency list - to multiple pairs of mutually accessible nodes (for all of generated pairs - there exists a path).

Each pair of nodes, actually represents connection: Source -> Target.

So, now, we can combine all pairs - which has the same source node:

Source -> [{Target 1, "Path-to-Target-1"}, {Target 2, "Path-to-target-2"}, ...]

Actually, after the combination - each source will be associated with a list of target nodes: list might contain duplicated target nodes (duplicated target nodes, just corresponds to different possible paths).

So, we just need to remove duplicates from list of target nodes (to keep only that target nodes, which corresponds to the shortest paths).

Two paragraphs from above - actually describes reduce step. The output of reduce step - is the same as input to map step.

So, finally - just repeat these map-reduce steps until convergence.