Eliminating extraneous edges in directed acyclic g

2019-05-08 20:04发布

问题:

I asked a question about finding subsequences in a variable amount of sets with no repeating characters. The solution was to create a matrix of each pair of letters, discard those that don't occur in each set, and then find the longest path in the directed acyclic graph. However, I don't want just the longest path, I want several of the longest paths (e.g. if it generates subsequences of lengths 10, 10, 9, 8, 8, 3, 3, 2, and 1, I may want to display the first 5 subsequences only).

And so, since I'm not looking for only the longest path, in order to generate the resulting subsequences, rather than using the longest path algorithm described in the wikipedia article, I'm using a naive algorithm which simply generates a list of all possible subsequences. This generates a set similar to the results in an answer to my previous question.

The problem is that I want to reduce the amount of subsequences it is generating.

For example, if I have the following sets:

A = AB12034
B = BA01234
C = AB01234

... my algorithm will currently come up with the following pairs that occur in each set:

A - 1     B - 1     1 - 2     2 - 3     0 - 3     3 - 4
    2         2         3         4         4
    3         3         4
    4         4
    0         0

This is technically correct, but I would like to eliminate some of these pairs. For example, notice that 2 always comes after 1. Therefore, I would like to eliminate the A2 and B2 pairs (i.e. A and B should never jump directly to 2... they should always go through 1 first). Also, 1 should never jump to any number besides 2, since 2 always occurs immediately after it. Furthermore, notice how 0 always occurs between B and 3, so I would like to eliminate the pair B3 (again, B should always go through 0 before it jumps to 3, since all sets have the positions of these three letters as: B < 0 < 3).

Just to be clear, the current algorithm will come up with these subsequences: (I included only those which begin with A for brevity):

A1234
A124  *
A134  *
A14   *
A234  *
A24   *
A34   *
A4    *
A034
A04   *

... and all those marked with * should be eliminated.

The (correct) pairs which generate the desired subsequences would be:

A - 1     B - 1     1 - 2     2 - 3     0 - 3     3 - 4
    0         0                           

... and the complete list of subsequences would be:

A1234
A034
B1234
B034
1234
234
034
34

In other words, I'm trying to go from this directed acyclic graph:

To this:

What sort of algorithm/logic should I use in order to get rid of these extraneous pairs (i.e. graph edges)? Or do you think that my logic in generating the pairs in the first place is the thing that should be changed?

回答1:

Furthermore, notice how 0 always occurs between B and 3, so I would like to eliminate the pair B3 (again, B should always go through 0 before it jumps to 3, since all sets have the positions of these three letters as: B < 0 < 3).

Hmm, okay so if n0 < n1 < n2 holds on all sets then remove all (n0, n2) pairs? This can be achieved with this (in pseudoPython):

for(edge in node):
    if(len(LongestPath(node, edge.Node)) > 1):
        RemovePair(node, edge.Node)


回答2:

Easy is easy. If the graph isn't too large, it's probably also efficient enough.

  • For each node (start with nodes without incoming edges), follow all paths, marking distances, mark all direct children with 1 and put them in a queue. While the queue is not empty, pull a node n out, let d be its distance form the start. Look at all its direct children, if any is marked with 1, remove the edge from start to that, put all children of n into the queue marked with distance d+1. Pull next node from queue.

What JSPerfUnkn0wn said, only with a bit more detail.



回答3:

Since the graph is acyclic, a possible solution is applying your favorite shortest-path algorithm (Bellman-frod, Floyd-Warshal, etc) but with the comparison condition flipped (so that longer paths win over shorter paths).