-->

Understanding regex string matching using Dynamic

2020-07-16 02:33发布

问题:

I came across this problem that asks you to implement a regular expression matcher with support for '.' and '*', where

'.' Matches any single character.

'*' Matches zero or more of the preceding element.

isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

While I was able to solve this in a linear fashion, I came across lots of solutions that used DP, like the one below,

class Solution {
    public boolean isMatch(String text, String pattern) {
        boolean[][] dp = new boolean[text.length() + 1][pattern.length() + 1];
        dp[text.length()][pattern.length()] = true;

        for (int i = text.length(); i >= 0; i--){
            for (int j = pattern.length() - 1; j >= 0; j--){
                boolean first_match = (i < text.length() && 
                                       (pattern.charAt(j) == text.charAt(i) ||
                                        pattern.charAt(j) == '.'));
                if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){
                    dp[i][j] = dp[i][j+2] || first_match && dp[i+1][j];
                } else {
                    dp[i][j] = first_match && dp[i+1][j+1];
                }
            }
        }
        return dp[0][0];
    }
}

I'm having a hard time understanding this. I've solved a few DP problems that involved grids (shortest path in 2d grid, largest square in binary 2d grid), using a DP table there made perfect sense to me. However here I'm completely lost, I'm unable to understand how traversing a 2d table helps in solving this problem. Further more it appears we know when the characters don't match in the loop, so I don't understand why we don't terminate the search there (this is also probably due to my lack of understanding of how a table traversal leads to a solution). Is there a clear intuitive explanation for problems like these?

回答1:

The intuition to solve problem like this using DP is to figure out answer to following questions

  1. Can the problem be solved using recursion? Which means can it be represented in terms of smaller sub-problem of same type?
  2. Do smaller sub-problems get repeated in the recursion tree? If yes can the result of smaller problem be stored in a manner that whenever similar sub-problem is encountered result can be accessed in O(1). This is usually called memoization.

Lets first understand the problem's solution which you would have figured out while solving in linear fashion.

  1. While matching the text with pattern either first character will match or it will not match.

    Case 1: First character matches or first character of pattern is '.'

    Case 1.1 Next character is '*'

    Case 1.2 Next character is not '*'

    Case 2: First character does not match

    case 2.1 Next character is '*'

    case 2.2 Next character is not '*'

Lets now figure out two steps discussed earlier for above problem.

For case 1.1 examples are

isMatch("a", "a*a") and isMatch("aab", "a*b") which is equivalent to solving

isMatch("a", "a") || isMatch("", "a*a") and

isMatch("aab", "b") || isMatch("ab", "a*b") respectively. First part of || condition covers the scenario of optional character in pattern as it is followed by '*' and second part covers the scenario for repetition of matched character.

Similarly sub-problems can be formed for all the cases. case 2.2 should straightaway return false.

Below is the java code with recursive approach

public boolean isMatch(String text, String pattern) {
    dp = new Boolean[text.length()][pattern.length()];
    return isMatch(text, pattern, 0, 0);
}

private boolean isMatch(String text, String pattern, int ti, int pi) {

    if (pi == pattern.length() && ti < text.length()) return false;

    if (ti == text.length() && pi == pattern.length()) return true;

    if (ti == text.length()) {
        return isNextCharStar(pattern, pi) && isMatch(text, pattern, ti, pi + 2);
    }


    boolean result;
    final boolean hasFirstMatched = text.charAt(ti) == pattern.charAt(pi) || pattern.charAt(pi) == '.';

    if (isNextCharStar(pattern, pi)) {
        result = isMatch(text, pattern, ti, pi + 2);
        if (hasFirstMatched) {
            result = result || isMatch(text, pattern, ti + 1, pi);
        }
        return result;
    }

    return hasFirstMatched && isMatch(text, pattern, ti + 1, pi + 1);

}

private boolean isNextCharStar(String pattern, int pi) {
    return pi < pattern.length() - 1 && pattern.charAt(pi + 1) == '*';
}

Now lets apply the memoization step. If we start saving the result before returning we can reuse it next time. How and where should we save it? For all possible combinations of ti and pi we have to store the result. Where ti is text Index and pi is pattern index. So a 2d array of size text.length * pattern.length should be sufficient. Following is the code after above changes

Boolean [][] dp;

public boolean isMatch(String text, String pattern) {
    dp = new Boolean[text.length()][pattern.length()];
    return isMatch(text, pattern, 0, 0);
}

private boolean isMatch(String text, String pattern, int ti, int pi) {

    if (pi == pattern.length() ) return ti == text.length();

    if (ti == text.length()) {
        return isNextCharStar(pattern, pi) && isMatch(text, pattern, ti, pi + 2);
    }

    if(dp[ti][pi] != null) return dp[ti][pi];

    boolean result;
    final boolean hasFirstMatched = text.charAt(ti) == pattern.charAt(pi) || pattern.charAt(pi) == '.';

    if (isNextCharStar(pattern, pi)) {
        result = isMatch(text, pattern, ti, pi + 2);
        if (hasFirstMatched) {
            result = result || isMatch(text, pattern, ti + 1, pi);
        }
        dp[ti][pi] = result;
        return result;
    }

    dp[ti][pi] = hasFirstMatched && isMatch(text, pattern, ti + 1, pi + 1);
    return dp[ti][pi];

}

private boolean isNextCharStar(String pattern, int pi) {
    return pi < pattern.length() - 1 && pattern.charAt(pi + 1) == '*';
}

If you will look closely only 3 lines have been changed to make it a DP solution from a recursive solution.