Knuth-Morris-Pratt Fail table

2019-04-02 12:17发布

问题:

I am studying for an exam I have and I am looking over the Knuth-Morris-Pratt algorithm. What is going to be on the exam is the Fail table and DFA construction. I understand DFA construction, but I don't really understand how to make the fail table.

If I have an example of a pattern "abababc" how do I build a fail table from this? The solution is:

Fail table:

0 1 2 3 4 5 6 7

0 0 0 1 2 3 4 0

but how do I get that? No code just an explanation of how to get that is necessary.

回答1:

The value of cell i in the fail table for string s is defined as follows: take the substring of s that ends at position i, and the value in the cell is the length of the longest proper(not the whole string) sufix of this substring that is equal to its prefix of the same length.

Let's take your example and consider the value for 6. The substring of s with length 6 is ababab. It has 6 suffixes: babab, abab, bab, ab and b on the other hand its proper prefixes are ababa, abab, aba, ab and a. Now it is easy to see that the sufixes that are equal to prefixes of the same length are abab and ab. Of these the longer is abab and thus the value in cell 6 is the its length - 4.



回答2:

Pattern P = {abababc} P[0] = 'a'. P[1] = 'b'. P[2] = 'a'. P[3] = 'b'. P[4] = 'a'. P[5] = 'b'. P[6] = 'c'.

The motive of the Fail Table is to identify the maximum possible shift (such that we would not miss out on any pattern matching, but would also not make unnecessary comparison), if first "i" character of the pattern string are matching and the break is found at the i+1 th character.

The number in the Fail Table indicates how many character still continues to match after the shift if the first i character of the pattern matches to the text.

Let FailTable be FT[].

FT[1] - 'a' matches with text. Break found at 'b'(P[1]). Do we have a proper suffix of 'a' which matches the proper prefix of 'a'? Ans is NO. So length of the String which still continues to match after the shift is 0. Hence FT[1] = 0.

FT[2] - 'ab' matches with text. Break found at 'a' (P[2]). Do we have a proper suffix of 'ab' which matches the proper prefix of 'ab'? Ans is NO. So length of the String which still continues to match after the shift is 0. Hence FT[2] = 0.

FT[3] - 'aba' matches with text. Break found at 'b' (P[3]). Do we have a proper suffix of 'aba' which matches the proper prefix of 'aba'? Ans is YES ('a'). So length of the String which still continues to match after the shift is 1. Hence FT[3] = 1.

FT[4] - 'abab' matches with text. Break found at 'a' (P[4]). Do we have a proper suffix of 'abab' which matches the proper prefix of 'abab'? Ans is YES('ab'). So length of the String which still continues to match after the shift is 2. Hence FT[4] = 2.

FT[5] - 'ababa' matches with text. Break found at 'b' (P[5]). Do we have a proper suffix of 'ababa' which matches the proper prefix of 'ababa'? Ans is YES('aba'). So length of the String which still continues to match after the shift is 3. Hence FT[5] = 3.

FT[6] - 'ababab' matches with text. Break found at 'a' (P[6]). Do we have a proper suffix of 'ababab' which matches the proper prefix of 'ababab'? Ans is YES('abab'). So length of the String which still continues to match after the shift is 4. Hence FT[6] = 4.

FT[7] - 'abababc' matches with text. No break found at all, Pattern matched with the text. Do we have a proper suffix of 'abababc' which matches the proper prefix of 'abababc'? Ans is NO. So length of the String which still continues to match after the shift is 0. Hence FT[7] = 0.

Hence the final array is FT = [0,0,1,2,3,4,0]

Hope it helps!