[NOTE: I searched beforehand and couldn't find advice on solving the LCS problem for all subsequences.]
I'm having trouble coding up a solution to the "longest common subsequence" problem where I return all the longest common subsequences for two input Strings. I looked at the Wikipedia page and tried to implement the pseudo-code on there, but ran into a problem with my "backtrackAll" method. I believe I'm computing the LCS matrix correctly below, but my "backtrackAll" method returns an empty set. Any tips on what I'm doing wrong?
public static void main (String[] args) {
String s1 = "AGCAT";
String s2 = "GAC";
int[][] matrix = computeMatrix(s1,s2);
HashSet<String> set = backtrackAll(matrix,s1,s2,s1.length(),s2.length());
//more stuff would go here...
}
private static int[][] computeMatrix(String s1, String s2) {
int[][] C = new int[s1.length()+1][s2.length()+1];
for (int i = 1; i < s1.length()+1; i++) {
for (int j = 1; j < s2.length()+1; j++) {
if (s1.charAt(i-1) == s2.charAt(j-1)) {
C[i][j] = C[i-1][j-1] + 1;
} else {
C[i][j] = Math.max(C[i][j-1], C[i-1][j]);
}
}
}
return C;
}
private static HashSet<String> backtrackAll(int[][] C, String s1, String s2, int i, int j) {
if (i == 0 || j == 0) {
return new HashSet<String>();
} else if (s1.charAt(i-1) == s2.charAt(j-1)) {
HashSet<String> R = backtrackAll(C,s1,s2,i-1,j-1);
HashSet<String> new_set = new HashSet<String>();
for (String Z: R) {
new_set.add(Z + s1.charAt(i-1));
}
return new_set;
} else {
HashSet<String> R = new HashSet<String>();
if (C[i][j-1] >= C[i-1][j]) {
R = backtrackAll(C, s1, s2, i, j-1);
}
if (C[i-1][j] >= C[i][j-1]) {
R.addAll(backtrackAll(C,s1,s2,i-1,j));
}
return R;
}
}
Since this is "homework", here are a couple of tips.
Make sure that you understand the algorithm that you have coded. This is probably the single most important step in figuring out what is wrong with your implementation.
Try using a debugger to figure out what is going on. Compare what you think should be happening with what is actually happening.
Try adding some
assert
statements to the code to check preconditions, postconditions and invariants that you believe should hold true. (Run withjava -ea ...
)Stick to the normal Java naming conventions. Variable names start with a lowercase letter. No underscores in variable names.
The second answer prints everything but not only the longest, mine is correct.
I modified it a bit. It now works. You should also consider when a null
HashSet
is returned in whose case the last matched character has to be added.Here are two versions in C# to get to the longest common subsequences (you may refer to: http://codingworkout.blogspot.com/2014/07/longest-common-subsequence.html)
Backtracking based on cached table which contains the length of longest common subsequences
While caching, instead of caching, legths, capturing the lcs itself.
Version 1 (backtracking based on lcs length of prefixes):
**where the caller of the recursive function is **
version 2 - where the cache captures the lcs of all prefixes
Unit Tests