Understanding the Knuth Morris Pratt(KMP) Failure

2020-07-24 05:08发布

问题:

I've been reading the Wikipedia article about the Knuth-Morris-Pratt algorithm and I'm confused about how the values are found in the jump/partial match table.

  i  |  0  1  2  3  4  5  6
W[i] |  A  B  C  D  A  B  D
T[i] | -1  0  0  0  0  1  2

If someone can more clearly explain the shortcut rule because the sentence

"let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible)"

is confusing. If the proper suffix ends at W[2] wouldn't it be size of 3?

Also I'm wondering why T[4] isn't 1 when there is a prefix and suffix of size 1: The A.

Thanks for any help that can be offered.

回答1:

Notice that the failure function T[i] does not use i as an index, but rather as a length. Therefore, T[2] represents the length of the longest proper border (a string that is both a prefix and suffix) of the string formed from the first two characters of W, rather than the longest proper border formed by the string ending at character 2. This is why the maximum possible value of T[2] is 2 rather than 3 - the substring formed from the first two characters of W can't have length any greater than 2.

Using this interpretation, it's also easier to see why T[4] is 0 rather than 1. The substring of W formed from the first four characters of W is ABCD, which has no proper prefix that is also a proper suffix.

Hope this helps!



回答2:

"let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible)"

Okay, the length can be maximum 2, it's correct, here is why... One fact: "proper" prefix can't be the whole string , same goes for "proper" suffix(like proper subset)

Lets, W[0]=A W[1]=A W[2]=A , i.e the pattern is "AAA", so, the (max length)proper prefix can be "AA" (left to right) and, the (max length) proper suffix can be "AA" (right to left) //yes, the prefix and suffix have overlaps (the middle "A")

So, the value would be 2 rather than 3, it would have been 3 only if the prefix was not proper.