We can find a minimum bottleneck spanning tree in O(E log*V) in the worst case by using Kruskal's algorithm. This is because every minimum spanning tree is a minimum bottleneck spanning tree.
But I got stuck on this job-interview question from this course.
How can we find a minimum bottleneck spanning tree in linear time even in the worst case. Note that we can assume that we can compute the median of n keys in linear time in the worst case.
- Get
V
, the median of the weights of the |E| edges.
- Find all edge's value no more than
V
, and get the subgraph
- If the subgraph is connected,
V
is the upper limit of the answer, and decrease the V
, repeat the step 1, 2.
- If the subgraph is not connected, let the connected component to become a node, and increase the
V
, repeat the step 1, 2.
Then you can solve the question in linear time.
PS: Using DFS to judge the subgraph is connected. and the complexity is O(E/2) + O(E/4) + O(E/8) + ... = O(E)
The standard algorithm for finding Minimum Bottleneck Spanning Tree (MBST) is known as Camerini’s algorithm. It runs in linear time and is as follows:
1. Find a median edge weight in graph and partition all edges in to two
partitions A and B around it. Let A have all edges greater than
pivot and B has all edges less than or equal to pivot.
2. Run DFS or BFS on partition B. If it connected graph then again run
step 1 on it.
3. If partition B wasn't a connected graph then we must have more than
1 connected components in it. Create a new graph by contracting each
of these connected components as a single vertex and select edges
from partition A that connects these components. MBST is given by
edges in connected components + MBST of new graph.
In pseudo-code:
1: procedure MBST(V, E)
2: if |E| = 1 then
3: Return E
4: else
5: A, B ← Partition E in two halves around median
6: A is higher half, B is lower half
7: F ← Connected components of B
8: if |F| = 1 and spans all vertices then
9: Return MBST(V, B)
10: else
11: V' ← create one vertex for each connected component in F
12: plus vertices missing from F
13: B' ← Edges from A that connects components in F
14: and edges to vertices not in F
15: Return F ∪ MBST(V', B')
16: end if
17: end if
18: end procedure
Implementation notes:
- Median can be found in O(n).
- Line 7 can generate disjoint-set data structure using BFS or DFS.
- Line 13 involves filtering out edges in A where each edge has endpoints that are either in two different connected components in F or one endpoint is vertex not in F and other is in F or both are not in F. This tests can be done using efficient disjoint-set data structure in O(1) amortized time for each edge.
Update: I've now also created Wikipedia page on this topic.
I found the other answers lacking detail, confusing or plain wrong. For example, ShitalShah's answer states:
Create a new graph by contracting each of these connected components
as a single vertex and select edges from partition A that connects
these components
Then in his pseudocode later:
11: V' ← create one vertex for each connected component in F
12: plus vertices missing from F
13: B' ← Edges from A that connects components in F
14: and edges to vertices not in F
"vertices missing from F" or "edges to vertices not in F" is not mentioned in the description immediately preceding the pseudocode. If we let that slide, we soon find more discrepancies:
Line 13 involves filtering out edges in A where each edge has
endpoints that are either in two different connected components in F
or one endpoint is vertex not in F and other is in F or both are not
in F.
What? Description and pseudocode were talking about connecting the different connected components using the larger edges, and now we are suddenly filtering them out???
There's more: The link to the actual algorithm returns a 403 forbidden. The Wikipedia article is fraught with similar discrepancies. How do we create a super vertex when they completely degenerate into one vertex per connected component? How to handle parallel edges from partition A after contraction-which one should we choose? $&T^$#)*)
I believe when attempting to provide an answer, we should assume that the reader knows where we live. Thus I present working code, probably the only one found on the web. Drumroll.
public class MBST<E> {
private static final NumberFormat formatter = NumberFormat.getNumberInstance(Locale.ENGLISH);
static {
formatter.setMinimumFractionDigits(2);
formatter.setMaximumFractionDigits(2);
}
private final UndirectedGraph<E> mbst;
public MBST(UndirectedGraph<E> g) {
System.out.printf("Graph:%n%s", g);
if (g.numEdges() <= 1) {
mbst = g;
return;
}
// can we calculate mean more efficiently?
DoubleSummaryStatistics stats = g.edges().stream()
.collect(summarizingDouble(e -> e.weight));
System.out.printf("Mean: %s%n", formatter.format(stats.getAverage()));
Map<Boolean, List<Edge<E>>> edgeMap = g.edges().stream()
.collect(groupingBy(e -> e.weight < stats.getAverage()));
List<Edge<E>> f = edgeMap.getOrDefault(true, emptyList());
if (f.isEmpty()) {
mbst = g;
return;
}
UndirectedGraph<E> b = new UndirectedGraph<>(f);
ConnectedComponents<E> cc = new ConnectedComponents<>(b);
if (cc.size == 1 && b.numVertices() == g.numVertices()) {
mbst = new MBST<>(b).mbst;
return;
}
Map<Edge<E>, Edge<E>> bPrime = new HashMap<>();
edgeMap.get(false)
.forEach(e1 -> {
boolean vInB = b.containsVertex(e1.v);
boolean wInB = b.containsVertex(e1.w);
boolean edgeInB = vInB && wInB;
E v = vInB ? cc.id(e1.v) : e1.v;
E w = wInB ? cc.id(e1.w) : e1.w;
Edge<E> e2 = new Edge<>(v, w);
bPrime.compute(e2, (key, val) -> {
// same connected component
if (edgeInB && v.equals(w)) return null;
if (val == null || val.weight > e1.weight) return e1;
return val;
});
});
mbst = new MBST<>(new UndirectedGraph<>(bPrime.values())).mbst;
for (Edge<E> e : f) mbst.addEdge(e);
}
public Iterable<Edge<E>> edges() {
return mbst.edges();
}
}
I tested it against the following graphs. The first image is from the Wikipedia article, the second from this paper.
Graph:
Vertices = [A, B, C, D, E, F, G]
Edges = [(A, B), (B, C), (D, E), (E, F), (F, G), (B, D), (C, E), (D, F), (E, G), (A, D), (B, E)]
Mean: 8.00
Graph:
Vertices = [A, B, C, D, E, F, G]
Edges = [(A, B), (C, E), (D, F), (E, G), (A, D), (B, E)]
Mean: 6.17
Graph:
Vertices = [A, B, E, G]
Edges = [(A, B), (E, G), (B, E)]
Mean: 7.00
Graph:
Vertices = [A, B, C, D, E, F, G]
Edges = [(A, B), (B, C), (C, D), (D, E), (F, G), (C, E), (D, F), (E, G), (B, G)]
Mean: 10.67
Graph:
Vertices = [E, G]
Edges = [(E, G)]