When to use Rabin-Karp or KMP algorithms?

2019-03-17 13:51发布

问题:

I have generated an string using the following alphabet. {A,C,G,T}. And my string contains more than 10000 characters. I'm searching the following patterns in it.

  • ATGGA
  • TGGAC
  • CCGT

I have asked to use a string matching algorithm which has O(m+n) running time.

m = pattern length
n = text length

Both KMP and Rabin-Karp algorithms have this running time. What is the most suitable algorithm (between Rabin-Carp and KMP) in this situation?

回答1:

When you want to search for multiple patterns, typically the correct choice is to use Aho-Corasick, which is somewhat a generalization of KMP. Now in your case you are only searching for 3 patterns so it may be the case that KMP is not that much slower(at most three times), but this is the general approach.

Rabin-Karp is easier to implement if we assume that a collision will never happen, but if the problem you have is a typical string searching KMP will be more stable no matter what input you have. However, Rabin-Karp has many other applications, where KMP is not an option.