I asked a question about finding subsequences in a variable amount of sets with no repeating characters. The solution was to create a matrix of each pair of letters, discard those that don't occur in each set, and then find the longest path in the directed acyclic graph. However, I don't want just the longest path, I want several of the longest paths (e.g. if it generates subsequences of lengths 10, 10, 9, 8, 8, 3, 3, 2, and 1, I may want to display the first 5 subsequences only).
And so, since I'm not looking for only the longest path, in order to generate the resulting subsequences, rather than using the longest path algorithm described in the wikipedia article, I'm using a naive algorithm which simply generates a list of all possible subsequences. This generates a set similar to the results in an answer to my previous question.
The problem is that I want to reduce the amount of subsequences it is generating.
For example, if I have the following sets:
A = AB12034
B = BA01234
C = AB01234
... my algorithm will currently come up with the following pairs that occur in each set:
A - 1 B - 1 1 - 2 2 - 3 0 - 3 3 - 4
2 2 3 4 4
3 3 4
4 4
0 0
This is technically correct, but I would like to eliminate some of these pairs. For example, notice that 2
always comes after 1
. Therefore, I would like to eliminate the A2
and B2
pairs (i.e. A
and B
should never jump directly to 2
... they should always go through 1
first). Also, 1
should never jump to any number besides 2
, since 2
always occurs immediately after it. Furthermore, notice how 0
always occurs between B
and 3
, so I would like to eliminate the pair B3
(again, B
should always go through 0
before it jumps to 3
, since all sets have the positions of these three letters as: B < 0 < 3
).
Just to be clear, the current algorithm will come up with these subsequences: (I included only those which begin with A
for brevity):
A1234
A124 *
A134 *
A14 *
A234 *
A24 *
A34 *
A4 *
A034
A04 *
... and all those marked with *
should be eliminated.
The (correct) pairs which generate the desired subsequences would be:
A - 1 B - 1 1 - 2 2 - 3 0 - 3 3 - 4
0 0
... and the complete list of subsequences would be:
A1234
A034
B1234
B034
1234
234
034
34
In other words, I'm trying to go from this directed acyclic graph:
To this:
What sort of algorithm/logic should I use in order to get rid of these extraneous pairs (i.e. graph edges)? Or do you think that my logic in generating the pairs in the first place is the thing that should be changed?