How to compute a minimum bottleneck spanning tree

2019-03-14 04:06发布

问题:

We can find a minimum bottleneck spanning tree in O(E log*V) in the worst case by using Kruskal's algorithm. This is because every minimum spanning tree is a minimum bottleneck spanning tree.

But I got stuck on this job-interview question from this course.

How can we find a minimum bottleneck spanning tree in linear time even in the worst case. Note that we can assume that we can compute the median of n keys in linear time in the worst case.

回答1:

  1. Get V, the median of the weights of the |E| edges.
  2. Find all edge's value no more than V, and get the subgraph
    • If the subgraph is connected, V is the upper limit of the answer, and decrease the V, repeat the step 1, 2.
    • If the subgraph is not connected, let the connected component to become a node, and increase the V, repeat the step 1, 2.

Then you can solve the question in linear time.

PS: Using DFS to judge the subgraph is connected. and the complexity is O(E/2) + O(E/4) + O(E/8) + ... = O(E)



回答2:

The standard algorithm for finding Minimum Bottleneck Spanning Tree (MBST) is known as Camerini’s algorithm. It runs in linear time and is as follows:

 1. Find a median edge weight in graph and partition all edges in to two
    partitions A and B around it. Let A have all edges greater than
    pivot and B has all edges less than or equal to pivot.                
 2. Run DFS or BFS on partition B. If it connected graph then again run
    step 1 on it.        
 3. If partition B wasn't a connected graph then we must have more than
    1 connected components in it. Create a new graph by contracting each
    of these connected components as a single vertex and select edges
    from partition A that connects these components. MBST is given by
    edges in connected components + MBST of new graph.

In pseudo-code:

1:  procedure MBST(V, E)
2:      if |E| = 1 then
3:          Return E 
4:      else
5:          A, B ←  Partition E in two halves around median
6:                  A is higher half, B is lower half
7:          F ← Connected components of B
8:          if |F| = 1 and spans all vertices then
9:              Return MBST(V, B)
10:         else
11:             V' ← create one vertex for each connected component in F
12:                     plus vertices missing from F
13:             B' ← Edges from A that connects components in F
14:                     and edges to vertices not in F
15:             Return F ∪ MBST(V', B') 
16:         end if
17:     end if
18: end procedure

Implementation notes:

  1. Median can be found in O(n).
  2. Line 7 can generate disjoint-set data structure using BFS or DFS.
  3. Line 13 involves filtering out edges in A where each edge has endpoints that are either in two different connected components in F or one endpoint is vertex not in F and other is in F or both are not in F. This tests can be done using efficient disjoint-set data structure in O(1) amortized time for each edge.

Update: I've now also created Wikipedia page on this topic.



回答3:

I found the other answers lacking detail, confusing or plain wrong. For example, ShitalShah's answer states:

Create a new graph by contracting each of these connected components as a single vertex and select edges from partition A that connects these components

Then in his pseudocode later:

11:             V' ← create one vertex for each connected component in F
12:                     plus vertices missing from F
13:             B' ← Edges from A that connects components in F
14:                     and edges to vertices not in F

"vertices missing from F" or "edges to vertices not in F" is not mentioned in the description immediately preceding the pseudocode. If we let that slide, we soon find more discrepancies:

Line 13 involves filtering out edges in A where each edge has endpoints that are either in two different connected components in F or one endpoint is vertex not in F and other is in F or both are not in F.

What? Description and pseudocode were talking about connecting the different connected components using the larger edges, and now we are suddenly filtering them out???

There's more: The link to the actual algorithm returns a 403 forbidden. The Wikipedia article is fraught with similar discrepancies. How do we create a super vertex when they completely degenerate into one vertex per connected component? How to handle parallel edges from partition A after contraction-which one should we choose? $&T^$#)*)

I believe when attempting to provide an answer, we should assume that the reader knows where we live. Thus I present working code, probably the only one found on the web. Drumroll.

public class MBST<E> {
    private static final NumberFormat formatter = NumberFormat.getNumberInstance(Locale.ENGLISH);

    static {
        formatter.setMinimumFractionDigits(2);
        formatter.setMaximumFractionDigits(2);
    }

    private final UndirectedGraph<E> mbst;

    public MBST(UndirectedGraph<E> g) {
        System.out.printf("Graph:%n%s", g);

        if (g.numEdges() <= 1) {
            mbst = g;
            return;
        }

        // can we calculate mean more efficiently?
        DoubleSummaryStatistics stats = g.edges().stream()
                .collect(summarizingDouble(e -> e.weight));
        System.out.printf("Mean: %s%n", formatter.format(stats.getAverage()));
        Map<Boolean, List<Edge<E>>> edgeMap = g.edges().stream()
                .collect(groupingBy(e -> e.weight < stats.getAverage()));

        List<Edge<E>> f = edgeMap.getOrDefault(true, emptyList());
        if (f.isEmpty()) {
            mbst = g;
            return;
        }

        UndirectedGraph<E> b = new UndirectedGraph<>(f);
        ConnectedComponents<E> cc = new ConnectedComponents<>(b);

        if (cc.size == 1 && b.numVertices() == g.numVertices()) {
            mbst = new MBST<>(b).mbst;
            return;
        }

        Map<Edge<E>, Edge<E>> bPrime = new HashMap<>();
        edgeMap.get(false)
                .forEach(e1 -> {
                    boolean vInB = b.containsVertex(e1.v);
                    boolean wInB = b.containsVertex(e1.w);
                    boolean edgeInB = vInB && wInB;
                    E v = vInB ? cc.id(e1.v) : e1.v;
                    E w = wInB ? cc.id(e1.w) : e1.w;

                    Edge<E> e2 = new Edge<>(v, w);

                    bPrime.compute(e2, (key, val) -> {
                        // same connected component
                        if (edgeInB && v.equals(w)) return null;
                        if (val == null || val.weight > e1.weight) return e1;
                        return val;
                    });
                });
        mbst = new MBST<>(new UndirectedGraph<>(bPrime.values())).mbst;
        for (Edge<E> e : f) mbst.addEdge(e);
    }

    public Iterable<Edge<E>> edges() {
        return mbst.edges();
    }
}

I tested it against the following graphs. The first image is from the Wikipedia article, the second from this paper.

Graph:
Vertices = [A, B, C, D, E, F, G]
Edges = [(A, B), (B, C), (D, E), (E, F), (F, G), (B, D), (C, E), (D, F), (E, G), (A, D), (B, E)]
Mean: 8.00
Graph:
Vertices = [A, B, C, D, E, F, G]
Edges = [(A, B), (C, E), (D, F), (E, G), (A, D), (B, E)]
Mean: 6.17
Graph:
Vertices = [A, B, E, G]
Edges = [(A, B), (E, G), (B, E)]
Mean: 7.00

Graph:
Vertices = [A, B, C, D, E, F, G]
Edges = [(A, B), (B, C), (C, D), (D, E), (F, G), (C, E), (D, F), (E, G), (B, G)]
Mean: 10.67
Graph:
Vertices = [E, G]
Edges = [(E, G)]