Efficiently find all connected induced subgraphs

2019-01-23 17:38发布

问题:

Is there an efficient(*) algorithm to find all the connected (induced) subgraphs of a connected undirected vertex-labelled graph?

(*) I appreciate that, in the general case, any such algorithm may have O(2^n) complexity because, for a clique (Kn), there are 2^n connected subgraphs. However, the graphs I'm typically dealing with will have far fewer connected subgraphs, so I'm looking for a way to generate them without having to consider all 2^n subgraphs and throw away those that aren't connected (as in the solution to this question).

An algorithm that has running time that's linear in the number of the solutions (i.e. it can generate the solutions directly from the structure of the graph without wasting time considering all the non-solutions) would obviously be ideal. An additional step that's polynomial in the number of nodes would be fine too (e.g. pre-computing the transitive closure of the graph - not that I think that would help in this case).

Alternatively, a proof that there is no such solution would at least put me out of my misery.

回答1:

In recursive pseudocode, the 2^n algorithm is

GenerateAndTest(verticesNotYetConsidered, subsetSoFar):
    if verticesNotYetConsidered is empty:
        yield subsetSoFar if it induces a connected subgraph
    else:
        choose a vertex v in verticesNotYetConsidered
        GenerateAndTest(verticesNotYetConsidered - {v}, subsetSoFar)
        GenerateAndTest(verticesNotYetConsidered - {v}, subsetSoFar union {v})

It doesn't matter which vertex v is chosen; we even can choose differently in two sibling calls. We exploit this freedom to obtain an almost linear-time algorithm (n times the number of solutions) by pruning the recursion tree.

If subsetSoFar is empty, then the choice is still unconstrained. Otherwise, we choose v to be adjacent to one of the vertices in subsetSoFar. If no such v exists, we yield subsetSoFar and return, since there are no other solutions in this subtree.

Note the new invariant that subsetSoFar is always connected, so we can eliminate the explicit connectivity test. We do O(n) work at each node of the recursion tree (naively O(n^2) but we can maintain the set of adjacent vertices incrementally), which is complete binary and whose leaves each yield exactly one solution, so the total work is as claimed (recall that the number of internal nodes is one less than the number of leaves).

On account of the new invariant, no disconnected subgraph is yielded.

Each connected subgraph is yielded. For a set of vertices S that induces a connected subgraph, follow the branches that agree with S. It's not possible for a proper subset of S to have no adjacency to the rest of S, so S is not pruned.

The new pseudocode is as follows. N(v) denotes the set of neighbors of v.

GenerateConnectedSubgraphs(verticesNotYetConsidered, subsetSoFar, neighbors):
    if subsetSoFar is empty:
        let candidates = verticesNotYetConsidered
    else
        let candidates = verticesNotYetConsidered intersect neighbors
    if candidates is empty:
        yield subsetSoFar
    else:
        choose a vertex v from candidates
        GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
                                   subsetSoFar,
                                   neighbors)
        GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
                                   subsetSoFar union {v},
                                   neighbors union N(v))

EDIT: for graphs of max degree O(1), we can make this truly linear-time by maintaining verticesNotYetConsidered intersect neighbors, which I didn't do for the sake of clarity. This optimization probably isn't worth much if you exploit word-parallelism by representing the graph as an adjacency matrix where each row is stored as a one- or two-word bitmap.