Finding the Longest Palindrome Subsequence with le

2019-03-09 12:28发布

问题:

I am trying to solve a dynamic programming problem from Cormem's Introduction to Algorithms 3rd edition (pg 405) which asks the following:

A palindrome is a nonempty string over some alphabet that reads the same forward and backward. Examples of palindromes are all strings of length 1, civic, racecar, and aibohphobia (fear of palindromes).

Give an efficient algorithm to find the longest palindrome that is a subsequence of a given input string. For example, given the input character, your algorithm should return carac.

Well, I could solve it in two ways:

First solution:

The Longest Palindrome Subsequence (LPS) of a string is simply the Longest Common Subsequence of itself and its reverse. (I've build this solution after solving another related question which asks for the Longest Increasing Subsequence of a sequence). Since it's simply a LCS variant, it also takes O(n²) time and O(n²) memory.

Second solution:

The second solution is a bit more elaborated, but also follows the general LCS template. It comes from the following recurrence:

lps(s[i..j]) = 
    s[i] + lps(s[i+1]..[j-1]) + s[j], if s[i] == s[j];
    max(lps(s[i+1..j]), lps(s[i..j-1])) otherwise

The pseudocode for calculating the length of the lps is the following:

compute-lps(s, n):

    // palindromes with length 1
    for i = 1 to n:
        c[i, i] = 1
    // palindromes with length up to 2
    for i = 1 to n-1:
        c[i, i+1] = (s[i] == s[i+1]) ? 2 : 1

    // palindromes with length up to j+1
    for j = 2 to n-1:
        for i = 1 to n-i:
            if s[i] == s[i+j]:
                c[i, i+j] = 2 + c[i+1, i+j-1]
            else:
                c[i, i+j] = max( c[i+1, i+j] , c[i, i+j-1] )

It still takes O(n²) time and memory if I want to effectively construct the lps (because I 'll need all cells on the table). Analysing related problems, such as LIS, which can be solved with approaches other than LCS-like with less memory (LIS is solvable with O(n) memory), I was wondering if it's possible to solve it with O(n) memory, too.

LIS achieves this bound by linking the candidate subsequences, but with palindromes it's harder because what matters here is not the previous element in the subsequence, but the first. Does anyone know if is possible to do it, or are the previous solutions memory optimal?

回答1:

Here is a very memory efficient version. But I haven't demonstrated that it is always O(n) memory. (With a preprocessing step it can better than O(n2) CPU, though O(n2) is the worst case.)

Start from the left-most position. For each position, keep track of a table of the farthest out points at which you can generate reflected subsequences of length 1, 2, 3, etc. (Meaning that a subsequence to the left of our point is reflected to the right.) For each reflected subsequence we store a pointer to the next part of the subsequence.

As we work our way right, we search from the RHS of the string to the position for any occurrences of the current element, and try to use those matches to improve the bounds we previously had. When we finish, we look at the longest mirrored subsequence and we can easily construct the best palindrome.

Let's consider this for character.

  1. We start with our best palindrome being the letter 'c', and our mirrored subsequence being reached with the pair (0, 11) which are off the ends of the string.
  2. Next consider the 'c' at position 1. Our best mirrored subsequences in the form (length, end, start) are now [(0, 11, 0), (1, 6, 1)]. (I'll leave out the linked list you need to generate to actually find the palindrome.
  3. Next consider the h at position 2. We do not improve the bounds [(0, 11, 0), (1, 6, 1)].
  4. Next consider the a at position 3. We improve the bounds to [(0, 11, 0), (1, 6, 1), (2, 5, 3)].
  5. Next consider the r at position 4. We improve the bounds to [(0, 11, 0), (1, 10, 4), (2, 5, 3)]. (This is where the linked list would be useful.

Working through the rest of the list we do not improve that set of bounds.

So we wind up with the longest mirrored list is of length 2. And we'd follow the linked list (that I didn't record in this description to find it is ac. Since the ends of that list are at positions (5, 3) we can flip the list, insert character 4, then append the list to get carac.

In general the maximum memory that it will require is to store all of the lengths of the maximal mirrored subsequences plus the memory to store the linked lists of said subsequences. Typically this will be a very small amount of memory.

At a classic memory/CPU tradeoff you can preprocess the list once in time O(n) to generate a O(n) sized hash of arrays of where specific sequence elements appear. This can let you scan for "improve mirrored subsequence with this pairing" without having to consider the whole string, which should generally be a major saving on CPU for longer strings.



回答2:

First solution in @Luiz Rodrigo's question is wrong: Longest Common Subsesquence (LCS) of a string and its reverse is not necessarily a palindrome.

Example: for string CBACB, CAB is LCS of the string and its reverse and it's obviously not a palindrome. There is a way, however, to make it work. After LCS of a string and its reverse is built, take left half of it (including mid-character for odd-length strings) and complement it on the right with reversed left half (not including mid-character if length of the string is odd). It will obviously be a palindrome and it can be trivially proven that it will be a subsequence of the string.

For above LCS, the palindrome built this way will be CAC.