Difference between original Boyer–Moore and Boyer–

2020-08-09 05:10发布

问题:

I am not able to understand the changes which Horspool made in his algorithm. If you have any link of Boyer–Moore–Horspool algorithm then please do tell me.

回答1:

Here are my few observations:

BM:

Preprocessing complexity: Θ(m + σ)

Worst Case : Θ(nm) If pattern exists Θ(n+m) If pattern doesn't exist" 
Best Case : Θ(n/m) 
Space: Θ(σ) 
Comparisions: Θ(3n)

Preprocessing: Uses Good Suffix and Bad Character Shift.
At every step, it slides the pattern by the max of the slides suggested by 
the two heuristics. So it uses best of the two heuristics at every step.

Boyer Moore algorithm uses the "bad" text character itself to determine the 
shift distance.

Probably the fastest non-indexed text-search algorithm known.  
Works better in natural text and larger text searches.

A little difficult to implement.

Mostly used.

BMH:

Preprocessing complexity : Θ(m + σ)

Worst Case : Θ(nm)
Best Case : Θ(n)
Space : Θ(σ)
Comparisons : Θ(1/σ) to Θ(2/σ+1)

Preprocessing:

Recommends to Use only Bad Character Shift.
uses the rightmost character of the current text window to determine the 
shift distance.

Efficient for small alphabets

Easy to implement.


回答2:

I guess the wikipedia entry says it all.

EDIT:
According to one of the external links in the entry:

The bad-character shift used in the Boyer-Moore algorithm (see chapter Boyer-Moore algorithm) is not very efficient for small alphabets, but when the alphabet is large compared with the length of the pattern, as it is often the case with the ASCII table and ordinary searches made under a text editor, it becomes very useful. Using it alone produces a very efficient algorithm in practice. Horspool proposed to use only the bad-character shift of the rightmost character of the window to compute the shifts in the Boyer-Moore algorithm.

The preprocessing phase is in O(m+sigma) time and O(sigma) space complexity.

The searching phase has a quadratic worst case but it can be proved that the average number of comparisons for one text character is between 1/sigma and 2/(sigma+1).