I need to compare a string to multiple other constant strings in c. I am curious which is faster, to hash the string I am going to compare and compare it to all the other constant string hashes or just compare the strings as strings. thank you in advance
thank you for the answers I am going to be doing many comparisons. can anyone give me a good, fast, low resource intensive algorithm to use? The only hash I know of is MD5 and I have a feeling that is over kill.
I also want to add that the strings are maybe 20 or 30 characters long at the max with most being around 7.
Murmur hash is simple, fast and behaves well on statistical tests.
If you are trying to match a subject string against a set of other strings, you might consider using the Aho-Corasick String Matching Algorithm. It uses a trie to match the subject against all of the target strings in a single pass (it's also quite simple to implement).
Equality of a hash value does not guarantee equality - a mismatch will guarantee inequality, though. If you're going to need to compare a lot of strings against your collection the a hash would be great - if it's a one-off comparison (unlikely I guess) then strcmp will do nicely.
It greatly depends on the length of the strings and the complexity of your hash function. Implement and benchmark yourself would be the best answer...
If your constant strings are known at compile time, take a look at the idea of a "perfect hash".
Wikipedia: A perfect hash function for a set S is a hash function that maps distinct elements in S to distinct integers, with no collisions.
That "no collisions" thing saves you work. Possibilities for further reading and implementations are:
I think if you have a static list of strings, I would store them in a sorted array and then use
bsearch
to determine if a string is in that list. This returns NULL if it does not exist, or a pointer to the value should it exist and is probably faster than a linear search or hashing.