What are the available string matching algorithms besides Knuth-Morris-Pratt, Rabin-Karp and likes of it?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
A well cited compendium of these algorithms can be found in:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.4896&rep=rep1&type=pdf
Included are the following algorithms:
Karp-Rabin
Shift Or
Morris-Pratt
Knuth-Morris-Pratt
Simon
Colussi
Galil-Giancarlo
Apostolico-Crochemore
Not So Naive
Forward Dawg Matching
Boyer-Moore
Turbo-BM
Apostolico-Giancarlo
Reverse Colussi
Horspool
Quick Search
Tuned Boyer-Moore
Zhu-Takaoka
Berry-Ravindran
Smith
Raita
Reverse Factor
Turbo Reverse Factor
Backward Oracle Matching
plus about 15 others.
BTW, you might want to clarify if you are also interested in string similarity algorithms (e.g., Levenshtein distance, etc), which are closely related, if you are indeed interested in that.
回答2:
This page has good brief descriptions of many algorithms: http://www-igm.univ-mlv.fr/~lecroq/string/index.html
回答3:
Simone Faro and Thierry Lecroq provide a C implementation and references to 86 exact string matching algorithms on "String Matching Algorithms Research Tool" .
They also provide a framework to benchmark the string matching algorithms:
____________________________________________________________
Experimental results on englishTexts
Searching for a set of 200 patterns with length 128
Testing 5 algorithms
- [1/5] BM ..................[OK] 0.88 ms
- [2/5] EPSM ................[OK] 0.38 ms
- [3/5] KMP .................[OK] 6.23 ms
- [4/5] KR ..................[OK] 1.84 ms
- [5/5] TW ..................[OK] 2.70 ms
algorithms
- BM = Boyer Moore (1977)
- EPSM = Exact Packed String Matching algorithm (2013)
- KMP = Knuth Morris-Pratt (1977)
- KR = Karp Rabin (1987)
- TW = Two Way (1991)