Subgraph enumeration

2019-02-12 18:31发布

问题:

What is an efficient algorithm for the enumeration of all subgraphs of a parent graph. In my particular case, the parent graph is a molecular graph, and so it will be connected and typically contain fewer than 100 vertices.

Edit: I am only interested in the connected subgraphs.

回答1:

This question has a better answer in the accepted answer to this question. It avoids the computationally complex step marked "you fill in above function" in @ninjagecko's answer. It can deal efficiently with compounds where there are several rings.

See the linked question for the full details, but here's the summary. (N(v) denotes the set of neighbors of vertex v. In the "choose a vertex" step, you can choose any arbitrary vertex.)

GenerateConnectedSubgraphs(verticesNotYetConsidered, subsetSoFar, neighbors):
    if subsetSoFar is empty:
        let candidates = verticesNotYetConsidered
    else
        let candidates = verticesNotYetConsidered intersect neighbors
    if candidates is empty:
        yield subsetSoFar
    else:
        choose a vertex v from candidates
        GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
                                   subsetSoFar,
                                   neighbors)
        GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
                                   subsetSoFar union {v},
                                   neighbors union N(v))


回答2:

What is an efficient algorithm for the enumeration of all subgraphs of a parent graph. In my particular case, the parent graph is a molecular graph, and so it will be connected and typically contain fewer than 100 vertices.

Comparison with mathematical subgraphs:

You could give each element a number from 0 to N, then enumerate each subgraph as any binary number of length N. You wouldn't need to scan the graph at all.

If what you really want is subgraphs with a certain property (fully connected, etc.) that is different, and you'd need to update your question. As a commentor noted, 2^100 is very large, so you definitely don't want to (like above) enumerate the mathematically-correct-but-physically-boring disconnected subgraphs. It would literally take you, assuming a billion enumerations per second, at least 40 trillion years to enumerate them all.

Connected-subgraph-generator:

If you want some kind of enumeration that retains the DAG property of subgraphs under some metric, e.g. (1,2,3)->(2,3)->(2), (1,2,3)->(1,2)->(2), you'd just want an algorithm that could generate all CONNECTED subgraphs as an iterator (yielding each element). This can be accomplished by recursively removing a single element at a time (optionally from the "boundary"), checking if the remaining set of elements is in a cache (else adding it), yielding it, and recursing. This works fine if your molecule is very chain-like with very few cycles. For example if your element was a 5-pointed star of N elements, it would only have about (100/5)^5 = 3.2million results (less than a second). But if you start adding in more than a single ring, e.g. aromatic compounds and others, you might be in for a rough ride.

e.g. in python

class Graph(object):
    def __init__(self, vertices):
        self.vertices = frozenset(vertices)
        # add edge logic here and to methods, etc. etc.

    def subgraphs(self):
        cache = set()
        def helper(graph):
            yield graph
            for element in graph:
                if {{REMOVING ELEMENT WOULD DISCONNECT GRAPH}}:
                    # you fill in above function; easy if
                    # there is 0 or 1 ring in molecule
                    # (keep track if molecule has ring, e.g.
                    #  self.numRings, maybe even more data)
                    # if you know there are 0 rings the operation
                    #  takes O(1) time
                    continue
                subgraph = Graph(graph.vertices-{element})
                if not subgraph in cache:
                    cache.add(subgraph)
                    for s in helper(subgraph):
                        yield s
        for graph in helper(self):
            yield graph

    def __eq__(self, other):
        return self.vertices == other.vertices
    def __hash__(self):
        return hash(self.vertices)
    def __iter__(self):
        return iter(self.vertices)
    def __repr__(self):
        return 'Graph(%s)' % repr(set(self.vertices))

Demonstration:

G = Graph({1,2,3,4,5})

for subgraph in G.subgraphs():
    print(subgraph)

Result:

Graph({1, 2, 3, 4, 5})                                                                                                                                                                                                                                              
Graph({2, 3, 4, 5})
Graph({3, 4, 5})
Graph({4, 5})
Graph({5})
Graph(set())
Graph({4})
Graph({3, 5})
Graph({3})
Graph({3, 4})
Graph({2, 4, 5})
Graph({2, 5})
Graph({2})
Graph({2, 4})
Graph({2, 3, 5})
Graph({2, 3})
Graph({2, 3, 4})
Graph({1, 3, 4, 5})
Graph({1, 4, 5})
Graph({1, 5})
Graph({1})
Graph({1, 4})
Graph({1, 3, 5})
Graph({1, 3})
Graph({1, 3, 4})
Graph({1, 2, 4, 5})
Graph({1, 2, 5})
Graph({1, 2})
Graph({1, 2, 4})
Graph({1, 2, 3, 5})
Graph({1, 2, 3})
Graph({1, 2, 3, 4})